Обсуждение: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

Поиск

Список

Период

Сортировка

('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Bryn Llewellyn

Дата:

06 октября 2022 г., 03:16:51

The doc for "quote_ident()" says this:

«
https://www.postgresql.org/docs/14/functions-string.html
Returns the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only
ifnecessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are
properlydoubled. 
»

B.t.w, the value of "quote_ident()" rests on the distinction between a name (what you provide with the function's
actualargument) and an identifier (what it returns). Some of you flatly reject (borrowing a phrase from Tom) the
distinctionbetween these two terms of art. Oh well… 

Try this:

create table dog$(n int); -- OK
create table $dog(n int); -- Bad
create table "$dog"(n int); -- OK

These outcomes are consistent with the rules that say when a proposed name needs to be double-quoted to form its
identifierin a SQL statement (or PL/pgSQL source text). 

So it's correct for this to return FALSE:

select '$dog' = quote_ident('$dog');

But it's incorrect w.r.t. "quotes are added only if necessary" for this to return FALSE:

select 'dog$' = quote_ident('dog$');

"format()" shows the same error when you use the %I placeholder. I suppose that "format()" and "quote_ident()" share
thesame underlying implementation. 

select format('What happens with %I?', 'dog'); -- double quotes are not added
select format('What happens with %I?', 'dog$'); -- double quotes are added

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

"David G. Johnston"

Дата:

06 октября 2022 г., 03:31:48

On Wed, Oct 5, 2022 at 5:17 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:

The doc for "quote_ident()" says this:

«
https://www.postgresql.org/docs/14/functions-string.html
Returns the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled.
»

But it's incorrect w.r.t. "quotes are added only if necessary" for this to return FALSE:

select 'dog$' = quote_ident('dog$');

The documentation also says:

"Note that dollar signs are not allowed in identifiers according to the letter of the SQL standard, so their use might render applications less portable."

https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

So I can see an argument for the existing behavior. It doesn't seem worth changing in any case. And I don't really see the documentation being improved by covering this corner case in detail when the current behavior is at least intuitive.

David J.

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Tom Lane

Дата:

06 октября 2022 г., 04:20:26

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> So I can see an argument for the existing behavior.  It doesn't seem worth
> changing in any case.  And I don't really see the documentation being
> improved by covering this corner case in detail when the current behavior
> is at least intuitive.

quote_ident is a good bit more conservative than the core lexer
about what is an "identifier character" --- it considers all
non-ASCII characters as requiring quoting, too.

For typical uses of quote_ident, I think this is good future-proofing:
it makes it very much less likely that something quote_ident decides
not to quote would be rejected by some future PG version (not to mention
non-PG SQL databases).  So I'm not really in a hurry to change the code.
Maybe we should tweak the docs a bit.

            regards, tom lane

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Adrian Klaver

Дата:

06 октября 2022 г., 07:31:44

On 10/5/22 17:16, Bryn Llewellyn wrote:
> The doc for "quote_ident()" says this:
> 
> «
> https://www.postgresql.org/docs/14/functions-string.html
> Returns the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added
onlyif necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are
properlydoubled.
 
> »
> 
> B.t.w, the value of "quote_ident()" rests on the distinction between a name (what you provide with the function's
actualargument) and an identifier (what it returns). Some of you flatly reject (borrowing a phrase from Tom) the
distinctionbetween these two terms of art. Oh well…
 

What it returns is text, quoted if needed:

create table "$dog"(n int);

select  pg_typeof(quote_ident('$dog')), quote_ident('$dog');
  pg_typeof | quote_ident
-----------+-------------
  text      | "$dog"

The way I see is if it where an actual identifier then this:

select * from quote_ident('$dog');
  quote_ident
-------------
  "$dog"

would be equal to this:

select * from "$dog";
  n
---



-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Christophe Pettus

Дата:

06 октября 2022 г., 07:51:54

> On Oct 5, 2022, at 17:16, Bryn Llewellyn <bryn@yugabyte.com> wrote:
> B.t.w, the value of "quote_ident()" rests on the distinction between a name (what you provide with the function's
actualargument) and an identifier (what it returns). 

There is no first-class "identifier" type in PostgreSQL, so a function can't "return an identifier."  It returns a
stringwhich might, when placed into a larger string and processed as SQL, be lexically correct as an identifier. 

To be useful, quote_ident() can't fail to quote a string in such a way that it's not a valid identifier to PostgreSQL.
Ifit quotes some strings that PostgreSQL would accept as identifiers without quotes, that's interesting, I guess, but
I'mnot sure I see how it is a bug. 

Pragmatically, what this function is for it to assemble SQL statements as strings.  Any review of its correctness needs
tobe based on a situation where it can't be used for that.

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Bryn Llewellyn

Дата:

06 октября 2022 г., 08:04:35

tgl@sss.pgh.pa.us wrote:

david.g.johnston@gmail.com writes:

So I can see an argument for the existing behavior. It doesn't seem worth changing in any case. And I don't really see the documentation being improved by covering this corner case in detail when the current behavior is at least intuitive.

quote_ident is a good bit more conservative than the core lexer about what is an "identifier character" --- it considers all non-ASCII characters as requiring quoting, too.

For typical uses of quote_ident, I think this is good future-proofing: it makes it very much less likely that something quote_ident decides not to quote would be rejected by some future PG version (not to mention non-PG SQL databases). So I'm not really in a hurry to change the code. Maybe we should tweak the docs a bit.

Is the use of *any* non-Latin character in the name of a database object a corner case? I appreciate that some outfits require Latin object names because they don't want to make it hard to have a global development staff. But not all outfits follow such a rule.

It's a pity that the doc reference that I started with and the second one that David mentioned:

https://www.postgresql.org/docs/14/functions-string.html

https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

don't x-reference each other. I had no reason to suspect that what I read in the dedicated account of quote_ident was something of a sketch—and that it (loosely speaking) has to be interpreted in the light of the second one. I appreciate that the PG doc aims for maximally terse, full-on DRY accounts. But neither of the two accounts gives a hint that quote_ident is less permissive than are PG's SQL and PL/pgSQL parsers themselves. This would have fixed it for me:

[quote_ident] Returns its input argument string suitably rendered to be used as an identifier in an SQL statement string. Rendering is not always needed and in some cases the return value is identical to the input value. The rules for when rendering is needed, and what form it takes, are explained in [x-ref]. Here are four examples:

select quote_ident('eye'); → eye

select quote_ident($$ab"cd'ef$$); → "ab""cd'ef"

select quote_ident('øye'); → "øye"

select quote_ident('我的桌子'); → "我的桌子"

The third and fourth examples show that quote_ident is much stricter than the rules explained in [x-ref] demand. They explain that for example, the SQL identifier for this role name:

redaktør

is that bare name—with no rendering. Quote_ident's unnecessary strictness means that some tempting uses for the function are not viable. A test like this:

if 'proposed_name' <> quote_ident('proposed_name') then ...

will give a false negative for large classes of proposed names.

About:

good future-proofing: it makes it very much less likely that something quote_ident decides not to quote would be rejected by some future PG version

Does this imply a risk that a future PG version will go against the SQL standard and reject any non-latin name that is free of all punctuation characters, when used in the role of a SQL identifier, unless it's double quoted?

After all, I'd guess that a great deal of SQL text and PL/pgSQL source text is typed by hand—and you don't usually see double quotes delineating every SQL identifier when this isn't needed. (And you almost never see double quotes delineating a SQL identifier where this rendering is necessary.)

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Bryn Llewellyn

Дата:

06 октября 2022 г., 08:48:48

adrian.klaver@aklaver.com wrote:

The way I see is if it where an actual identifier then this:

select * from quote_ident('$dog');

quote_ident
-------------
"$dog"

would be equal to this:

select * from "$dog";

I think that the clue here is to go into philosophical overdrive. SQL statements, and if-then-else programs, are platonic notions. Like in the famous example of the notion of Boston itself—as opposed to how it's denoted in different contexts. Some would refer to it by saying "Boston". Others would say "波士顿".

In our world, the phenomenon is illustrated by this (after authorizing as a superuser):

create role "my name" login;
create database db;
grant all on database db to "my name";

\c db "my name"

create schema s;
create table s."silly name"(n int);
select relname
from pg_class c inner join pg_roles r on c.relowner = r.oid
where r.rolname = 'my name';

This is the result (in "\t on" mode)

silly name

So that's *three* different ways to denote the platonic notion that I had in my head, of a certain table in a certain schema in a certain database, before I typed anything

What we deal with in our ordinary professional work is SQL texts, program source texts, within these, SQL identifier texts, and then the conventional display of the results of SQL and program execution. To emphasize the point about resulst display, try "\d s.*" in "\t off" mode. You'll see this:

But this SQL text:

drop table "s.silly name";

tells me that there's no such table. It's all a matter of convention. In an alternative universe, maybe manifest string constants are rendered, with no delineation, in red text; and identifiers are rendered, again with no delineation, in green text. In another universe, all SQL and program composition is done by talking. Manifest string constants are rendered by shouting; and identifiers are rendered in a whisper. And why not...

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Bryn Llewellyn

Дата:

06 октября 2022 г., 08:58:04

> xof@thebuild.com wrote:
>
> There is no first-class "identifier" type in PostgreSQL, so a function can't "return an identifier."  It returns a
stringwhich might, when placed into a larger string and processed as SQL, be lexically correct as an identifier. 

It takes huge discipline always to say "the text of an identifier" when the context of discourse is established. But,
yes,I agree, when I wrote this: 

«
...the value of "quote_ident()" rests on the distinction between a name (what you provide with the function's actual
argument)and an identifier (what it returns).  
»

that the context of discourse was indeed established. I should have made no such assumption and written this instead:

«
...the value of "quote_ident()" rests on the distinction between the text of a name (what you provide with the
function'sactual argument) and the text of an identifier (what it returns).  
»

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Gavan Schneider

Дата:

06 октября 2022 г., 11:49:58

On 6 Oct 2022, at 16:04, Bryn Llewellyn wrote:

Does this imply a risk that a future PG version will go against the SQL standard and reject any non-latin name that is free of all punctuation characters, when used in the role of a SQL identifier, unless it's double quoted?

From my perspective this thread seems to miss the essential purposes behind quote_ident(). It is part of processing external/user input —

Protecting from PostgreSQL which always maps everything to lower case before anything gets to the parser
Protection against SQL injection when processing input from outside the trusted perimeter

Expecting an arbitrary string to be equal to itself after it has been through string processing code is risky unless that processing is part of the design, and quote_ident() was never designed to be part of any such arrangement.

Expanding —

It is a complex question what happens to non-ASCII characters when they are mapped to lower case… sometimes this is a meaningful concept e.g., ∏ -> π, sometimes it is not, e.g., pick any Chinese/Korean/Japanese character. If the designer decides to use non-ASCII characters in the identifier they can… just double-quote those identifiers. If the designer wants to use camelCase ASCII they can, but the identifier will be camelcase inside the machine unless it was double quoted.

AFAIK we never really use quote_ident() except to process external input. As noted above this function is not designed to be part of an equality test when attempting system introspection, rather —

The simple quote_ident() function can also be used to wrap untrusted input so it will not mess with the parser. It is used with quote_literal() when building dynamic SQL statements from user (i.e., untrusted) input.

From my perspective any use of these function outside their scope is just that… outside their scope, with no promise this usage will work or comply with any current or future standard, or imply anything useful about pretty much anything.

Maybe I’m oversimplifying but I believe the current functions work and do their specific jobs, and have nothing to do with anything else. So there is no surprise for me in the subject line. There is mild surprise the question was asked.

BTW this ignores whether or not PG mapping everything that’s not quoted to lower case is standards compliant. This whole topic would be simpler if the case was left alone but that’s a long road ago and I believe most of the bridges have been burnt :)

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

Aw: Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Karsten Hilbert

Дата:

06 октября 2022 г., 12:16:30

What we deal with in our ordinary professional work is SQL texts, program source texts, within these, SQL identifier
texts,and then the conventional display of the results of SQL and program execution. To emphasize the point about
resulstdisplay, try "\d s.*" in "\t off" mode. You'll see this: 
 
              Table "s.silly name"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 n      | integer |           |          | 
 
But this SQL text:
 
drop table "s.silly name";
 
tells me that there's no such table.




And, indeed, there isn't. Perhaps there's a table s."silly name".

It is accidental if unfortunate that the <s.silly name> is quoted with ""'s in the \d output...

Karsten

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Bryn Llewellyn

Дата:

07 октября 2022 г., 01:24:23

> Karsten.Hilbert@gmx.net wrote:
>
>> bryn@yugabyte.com wrote:
>>
>> What we deal with in our ordinary professional work is SQL texts, program source texts, within these, SQL identifier
texts,and then the conventional display of the results of SQL and program execution. To emphasize the point about
resulstdisplay, try "\d s.*" in "\t off" mode. 
>>
>> [Allow me to re-write my bext few words, for sport.]
>>
>> But a SQL statement with the following text representation
>>
>> drop table "s.silly name”
>>
>> when presented to the psql CLI as a text command in its language by appending a semi-colon causes that program to
respondwith some text that tells me that there's no such table. 
>
> And, indeed, there isn't. Perhaps there's a table s."silly name”. It is accidental if unfortunate that the <s.silly
name>is quoted with ""'s in the \d output... 

I believe that you and I see things the same way, Karsten. Yes, it does seem at times that some things in PG are
accidental—andsometimes prove to be unfortunate. 

Naturally, my questions to this list concern what know that I don’t understand. (Who knows what I think that I
understand—butdon’t. And who knows what I don’t even suspect is there waiting for me to try to understand.) To err is
human—andit’s human, too, to misunderstand something when the proper understanding seems to be counter-intuitive. In my
case,I developed my intuitions in the context of a different RDBMS with notions and terms of art that differ very
significantlyfrom PG’s—even though the core SQL syntax and semantics are deceptively similar. 

Maybe I should send posts to this list like this:

«
I just wrote and tested a PG implementation to do something I hadn’t done before. I was very impressed at how
straightforwardit was—and with how expressive of my intentions the text of my code seemed to be. Well done PG. 
»

I do very often have that experience. But I’ve never seen a contribution along those lines in this forum—and I’ve
formedthe impression that it would be out of place.

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Bryn Llewellyn

Дата:

07 октября 2022 г., 01:53:48

list.pg.gavan@pendari.org wrote:

bryn@yugabyte.com wrote:

Does this imply a risk that a future PG version will go against the SQL standard and reject any non-latin name that is free of all punctuation characters, when used in the role of a SQL identifier, unless it's double quoted?

From my perspective this thread seems to miss the essential purposes behind quote_ident(). It is part of processing external/user input… Expecting an arbitrary string to be equal to itself after it has been through string processing code is risky unless that processing is part of the design, and quote_ident() was never designed to be part of any such arrangement.

…If the designer decides to use non-ASCII characters in the identifier they can… just double-quote those identifiers.

AFAIK we never really use quote_ident() except to process external input… There is no surprise for me in the subject line. There is mild surprise the question was asked.

Here's why I fell into the trap that I did.

(1) I used the Oracle RDBMS for a very long time. Now I’m working hard to forget everything that I learned there and learn the PG way instead. I’m frequently caught out.

(2) ORCL provides a function with the same aim as quote_indent. Its input is the text of a name. And its output is the text of a legal SQL identifier for that name. The output *always* starts and ends with a double quote. And why not? Sometimes the bare text of a name is perfectly acceptable as the text of a SQL identifier—and yet it’s still perfectly acceptable in that way when it's surrounded with double quotes.

(3) The PG doc on quote_ident says this in large friendly letters:

Quotes are added only if necessary…

Notice "only". I now know that this is very much not the case. You can compose an effectively unlimited number of different examples along these lines:

select quote_ident('redaktør'); → "redaktør"
create table redaktør(n int); → table successfully created

The doc might well have said that quotes are often added when they're not needed. I know that the doc also says "to be used as an identifier in an SQL statement string". But the reference doc for a function should give a precise and accurate specification of the rule that derives the output value from the input value(s). (There might well be performance caveats.) As long as this is reliable, then the user is free to use the function for any purpose where the specified behavior fits the use case.

(4) I do have a genuine use case. I know that, usually, it's best to preface a question to a forum like this with a clear statement of what you want to achieve. Well, in this case I wanted to achieve a full correct understanding of what quote_ident does. I think that I have this now. I accept that no change will ever be made to the actual behavior of quote_ident (for example, to have it always surround the return text with double quotes). And I accept that nobody is motivated to change the docs to stop PG neophytes like me being mislead by taking the extant wording at face value.

B.t.w., my specific use case is best solved by spending just a little effort to write my own function to check my own, context-specific, spec of what defines a bad name. Of course, it uses PG's native regular expression functionality. I was simply distracted from that effort when I briefly wondered if the ready-made quote_ident might save me some effort. I very soon realized that it would not. But that the same time, I couldn't help noticing that its behavior was at odds with the doc. This ought to surprise anybody but the most cynical amongst us. So naturally I asked about this.

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

"David G. Johnston"

Дата:

07 октября 2022 г., 02:01:11

On Thu, Oct 6, 2022 at 3:53 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:

(3) The PG doc on quote_ident says this in large friendly letters:

Quotes are added only if necessary…

Notice "only". I now know that this is very much not the case. You can compose an effectively unlimited number of different examples along these lines:

select quote_ident('redaktør'); → "redaktør"
create table redaktør(n int); → table successfully created

Yep, and that is precisely what would make for a good bug report. Pointing out that "if necessary" does not indeed match up with the behavior.

I suspect it is likely to get changed - everything else being discussed just detracts attention from it.

David J.

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Bryn Llewellyn

Дата:

08 октября 2022 г., 03:16:45

david.g.johnston@gmail.com wrote:

bryn@yugabyte.com wrote:

(3) The PG doc on quote_ident says this in large friendly letters:

Quotes are added only if necessary…

Notice "only". I now know that this is very much not the case. You can compose an effectively unlimited number of different examples along these lines:

select quote_ident('redaktør'); → "redaktør"
create table redaktør(n int); → table successfully created

Yep, and that is precisely what would make for a good bug report. Pointing out that "if necessary" does not indeed match up with the behavior. I suspect it is likely to get changed - everything else being discussed just detracts attention from it.

*BRIEFLY*

What does "make for a good bug report" mean, David? Is it:

(1.1) You, David, or somebody else who has been officially recognized as a PG Contributor (https://www.postgresql.org/community/contributors/) will file the bug, granting it credibility with their imprimatur?

or (1.2) I, Bryn, should file the bug.

About "I suspect it is likely to get changed", do you mean:

(2.1) Change the doc to match quote_ident's current, unpredictable, behavior? (By all means, substitute "hard to describe accurately, precisely, and yet still tersely" for "unpredictable".)

(2.2) Change quote_ident's implementation—and then write new doc to describe the new behavior precisely and accurately? And for this option, the next question is "What's the spec of the changed implementation?"

Notice that the issue is broader than just quote_ident, as this test shows:

prepare x(text) as select format('« %s », gives « %I ».', $1::text, $1::text);

execute x('dog');
execute x('Dog');
execute x('农民');

The same over-zealous double-quoting that quote_ident shows for 农民 is shown by format. Presumably they share the same underlying implementation (but, surprisingly, don't re-use the actual SQL parser code). Option 2.1 implies using the same wording for what provokes double-quoting for each function. I'd make a similar argument for option 2.2.

*MORE DETAIL*

About option (2.2), I mentioned that ORCL's equivalent to quote_ident implements a simpler rule: the text of the identifier that it returns is always surrounded with double quotes, whether or not doing this is necessary. The ORCL scheme relies on the fact that double-quoting when this isn't necessary is in no way harmful. Here’s pseudocode (presented as tested PL/pgSQL) for what a patched C implementation of quote_ident might do:

create function quote_ident_2(name in text)
  returns text
  language plpgsql
as $body$
declare
  i0     text not null := quote_ident(name);
  i1     text not null := regexp_replace(regexp_replace(i0, '^"', ''), '"$', '');
  ident  text not null := case(i1 = i0)
  when true then '"'||i0||'"'
  else   i0
  end case;
begin
  return ident;
end;
$body$;

Re David’s

everything else being discussed just detracts attention from it.

I’m not convinced. The discussion has shown that some people are somewhat confused. For example, it was suggested that a name like this:

农民

ought to be double-quoted. A simple test shows that this isn’t the case. And it helps if everybody is clear about that.

There's also the question of use-cases. I've been forced to think a lot about SQL injection over the years. It's immediately obvious from reading any of the skimpiest blogs on the topic that the root cause is always faulty code that's been written by a confused developer. But it's very rare to see an account of the root cause whose wording uses carefully defined terms of art. (In fact, the notion of defining and using such terms of art has been resisted by contributors to this list.) If a future PG shipped with a built-in function like this:

function is_exotic(name in text) returns boolean

...with, of course, excellent documentation, then an outfit that decided to outlaw the use of exotic names (and this is almost always how the outcome emerges) could police adherence to the rule, database wide, (and cluster wide for global phenomena) with a few trivial tests against the relevant catalog relations, like this:

do $body$
begin
assert not exists (select 1 from pg_class where is_exotic(relname));
end;
$body$;

A shipped "is_exotic" function would bring the secondary, but by no means insignificant, benefit, that the case for using “format” with "%I" or its "quote_ident" cousin could be grounded upon solidly defined, and named, notions. Like this example shows:

do $body$
declare
  stmt constant  text   not null := 'create table %s(n int);';
  n   text   not null := '';

  simple_names constant text[] not null := array['farmers', 'bønder', '农民', '农民、儿童', '农民，儿童'];

  exotic_names constant text[] not null := array['T', 'farmers and children', '"x', 'y"', '农民 & 儿童'];
begin
  foreach n in array simple_names loop
  assert not is_exotic(n), 'Problem with '||n;
  end loop;

  foreach n in array exotic_names loop
  assert is_exotic(n), 'Problem with '||n;
  end loop;
end;
$body$;

Who knew that using Chinese punctuation characters in a name (they have their own code points, and look a bit different when you look closely) don't make the name exotic...

*THE OTHER CHOICE (for option 2.2): make it true that quote_ident and format with %I surround the input with double quotes exactly and only when this is necessary*

Choosing between the two alternatives (the ORCL rule or dependable parsimony) wouldn't matter much if PG comes with a built-in is_exotic function. But if the user has to implement their own, then the implementation would be far, far simpler and therefore more reliable if it were decided to implement dependable parsimony.

The following code shows what I mean.

create function is_exotic(name in text)
returns boolean
language plpgsql
as $body$
declare
ident constant text not null := quote_ident_2(name);
stripped_ident constant text not null := regexp_replace(regexp_replace(ident, '^"', ''), '"$', '');
stmt constant text not null := 'deallocate %s';
begin
/*
This is the easy special case. The input is entirely letters inascii(7), or some
other character set that distinguishes between upper and lower case, and has at
least one upper case letter. This literal actual argument 'Dogs' is a sufficient
example. Here, the value of "stripped_ident" will be equal to the value of the
input. It's essential to test for this special case because the general test,
below, is (famously) happy to execute "deallocate Dog" and the like.
*/
if lower(name) <> stripped_ident then
return true;
end if;

/*
This is the harder general case. The input is anything at all that passes the
first test and that causes no error when used "as is" in the role of a SQL identifier.
Without access to the actual implementation of the SQL parser, the simplest practical
way to check this property is to use an empirical test. The "deallocate" statement is
fairly lightweight. The approach seems to bring no measureable performance penalty.
*/
begin
execute format(stmt, stripped_ident);
exception when invalid_sql_statement_name then null; end;
return false;

exception when syntax_error then
return true;
end;
$body$;

Compare this with the implementation that I thought, at first, that I could use when I simply believed the doc. (The subject line of this thread hits at the trivial SQL statement that would implement the "language SQL" function.) ANd if that's all there is to it, then when not ship is as a built-in?

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Adrian Klaver

Дата:

08 октября 2022 г., 03:27:17

On 10/7/22 17:16, Bryn Llewellyn wrote:
>> david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com> wrote:
>>
>>> bryn@yugabyte.com <mailto:bryn@yugabyte.com> wrote:
>>>
>>> (3) The PG doc on quote_ident says this in large friendly letters:
>>>
>>>> Quotes are added only if necessary…
>>>
>>> Notice "only". I now know that this is very much not the case. You 
>>> can compose an effectively unlimited number of different examples 
>>> along these lines:
>>>
>>> *select quote_ident('redaktør'); → "redaktør"
>>> create table redaktør(n int); → table successfully created
>>> *
>>
>> Yep, and that is precisely what would make for a good bug report. 
>> Pointing out that "if necessary" does not indeed match up with the 
>> behavior. I suspect it is likely to get changed - everything else 
>> being discussed just detracts attention from it.
> 
> **BRIEFLY**
> 
> What does "make for a good bug report" mean, David? Is it:

Oh for goodness sake just file a bug here:

https://www.postgresql.org/account/login/?next=/account/submitbug/

with the test case you showed in your previous post.

As to below:

1) If you want a guaranteed outcome then you are in the wrong business.

2) Excessive verbiage(writing for writing's sake) takes away from any 
argument you are trying to make. Less is more. I have come to the point 
where I ignore most of what you write as it really does not go anywhere 
other then make noise.







-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

Christophe Pettus

Дата:

08 октября 2022 г., 03:54:41

> On Oct 7, 2022, at 17:16, Bryn Llewellyn <bryn@yugabyte.com> wrote:
> What does "make for a good bug report" mean, David?

Well, first of all, brevity. :-)

> Is it:
>
> (1.1) You, David, or somebody else who has been officially recognized as a PG Contributor
(https://www.postgresql.org/community/contributors/)will file the bug, granting it credibility with their imprimatur? 
>
> or (1.2) I, Bryn, should file the bug.

That's unnecessarily snarky.  You are the one who feels that there is an issue, so you are the one who should report
thebug.  The community documents how to file a bug report here: 

    https://www.postgresql.org/docs/current/bug-reporting.html

People completely new to the community report bugs all the time; it's often their first contact with the community.

I don't think there is widespread agreement that this is as big an issue as you clearly feel it is, so it's up to you
topersuade the community that it's worth changing.  You may not be successful, and the implementation of quote_ident
staysthe same as it is now.  If that makes PostgreSQL useless to you, well, contact the bursar for a full refund. 

> About "I suspect it is likely to get changed", do you mean:
>
> (2.1) Change the doc to match quote_ident's current, unpredictable, behavior? (By all means, substitute "hard to
describeaccurately, precisely, and yet still tersely" for "unpredictable".) 
>
> (2.2) Change quote_ident's implementation—and then write new doc to describe the new behavior precisely and
accurately?And for this option, the next question is "What's the spec of the changed implementation?" 

Make a proposal, and it can be debated.  If you feel up to it, prepare a documentation patch, a code patch, or both.
It'sjust SGML and C; they won't bite.  But you really need to make a specific, concrete, "it should do this instead"
proposal.

And, with all due respect, there is a strong "you FOOLS" tone to your conversation on the list, as if the community are
allmorons, stubborn, or both.  I would really suggest you dial that back, because it's getting in the way of your
often-reasonablepoints.

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

От

"David G. Johnston"

Дата:

08 октября 2022 г., 05:10:34

On Fri, Oct 7, 2022 at 5:16 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:

david.g.johnston@gmail.com wrote:

bryn@yugabyte.com wrote:

(3) The PG doc on quote_ident says this in large friendly letters:

Quotes are added only if necessary…

Notice "only". I now know that this is very much not the case. You can compose an effectively unlimited number of different examples along these lines:

select quote_ident('redaktør'); → "redaktør"
create table redaktør(n int); → table successfully created

Yep, and that is precisely what would make for a good bug report. Pointing out that "if necessary" does not indeed match up with the behavior. I suspect it is likely to get changed - everything else being discussed just detracts attention from it.

*BRIEFLY*

What does "make for a good bug report" mean, David? Is it:

(1.1) You, David, or somebody else who has been officially recognized as a PG Contributor (https://www.postgresql.org/community/contributors/) will file the bug, granting it credibility with their imprimatur?

The research, evidence, and argument should be able to stand on their own. With those qualities it doesn't really matter too much the reputation of the person filing.

or (1.2) I, Bryn, should file the bug.

I was providing some suggested wording for how your original email could have been written as a simple bug report. And I definitely encourage people to take the time to consider and write good bug reports - while I do try and provide a community service in either writing or responding to such reports I am quite happy focusing on the later.

About "I suspect it is likely to get changed", do you mean:

(2.1) Change the doc to match quote_ident's current, unpredictable, behavior? (By all means, substitute "hard to describe accurately, precisely, and yet still tersely" for "unpredictable".)

The documentation would be my expectation, but the report doesn't need to presuppose either outcome, just point out the inconsistency.

(2.2) Change quote_ident's implementation—and then write new doc to describe the new behavior precisely and accurately? And for this option, the next question is "What's the spec of the changed implementation?"

Notice that the issue is broader than just quote_ident, as this test shows:

Then add that to the report - they do indeed seem to be of a similar nature. If they weren't, then you'd have two bug reports.

I’m not convinced. The discussion has shown that some people are somewhat confused. For example, it was suggested that a name like this:

The follow-on conversation was likely to happen once the "why" of the inconsistency started to be discussed.

Compare this with the implementation that I thought, at first, that I could use when I simply believed the doc. (The subject line of this thread hits at the trivial SQL statement that would implement the "language SQL" function.) ANd if that's all there is to it, then when not ship is as a built-in?

I just suggest you separate straight-forward seeming bugs from philosophical discussions. This was a good example of a situation that was simple enough to be a bug - whether the docs or code get changed is what the bug report follow-up from the community is meant to work out.

David J.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE