Обсуждение: regclass and format('%I')

Поиск
Список
Период
Сортировка

regclass and format('%I')

От
Jason Dusek
Дата:
Hi All,

The difference in how format handles `regclass` and `name` seems like an
inconsistency:

    WITH conversions(casts, format, result) AS (
    VALUES (ARRAY['name']::regtype[],             '%I', format('%I',
name('select'))),
           (ARRAY['name']::regtype[],             '%L', format('%L',
name('select'))),
           (ARRAY['name']::regtype[],             '%s', format('%s',
name('select'))),
           (ARRAY['regclass']::regtype[],         '%I', format('%I',
regclass('select'))),
           (ARRAY['regclass']::regtype[],         '%L', format('%L',
regclass('select'))),
           (ARRAY['regclass']::regtype[],         '%s', format('%s',
regclass('select'))),
           (ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
name(regclass('select')))),
           (ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
name(regclass('select')))),
           (ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
name(regclass('select'))))
    ) SELECT * FROM conversions;
          casts      | format |    result
    -----------------+--------+--------------
     {name}          | %I     | "select"
     {name}          | %L     | 'select'
     {name}          | %s     | select
     {regclass}      | %I     | """select"""
     {regclass}      | %L     | '"select"'
     {regclass}      | %s     | "select"
     {regclass,name} | %I     | """select"""
     {regclass,name} | %L     | '"select"'
     {regclass,name} | %s     | "select"

My assumption is that they both represent valid SQL identifiers, so it stands
to reason that `%I` should result in a valid identifier for both of them (or
neither one).

Kind Regards,
  Jason Dusek


Re: regclass and format('%I')

От
"David G. Johnston"
Дата:
On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek <jason.dusek@gmail.com> wrote:
Hi All,

The difference in how format handles `regclass` and `name` seems like an
inconsistency:

    WITH conversions(casts, format, result) AS (
    VALUES (ARRAY['name']::regtype[],             '%I', format('%I',
name('select'))),
           (ARRAY['name']::regtype[],             '%L', format('%L',
name('select'))),
           (ARRAY['name']::regtype[],             '%s', format('%s',
name('select'))),
           (ARRAY['regclass']::regtype[],         '%I', format('%I',
regclass('select'))),
           (ARRAY['regclass']::regtype[],         '%L', format('%L',
regclass('select'))),
           (ARRAY['regclass']::regtype[],         '%s', format('%s',
regclass('select'))),
           (ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
name(regclass('select')))),
           (ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
name(regclass('select')))),
           (ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
name(regclass('select'))))
    ) SELECT * FROM conversions;
          casts      | format |    result
    -----------------+--------+--------------
     {name}          | %I     | "select"
     {name}          | %L     | 'select'
     {name}          | %s     | select
     {regclass}      | %I     | """select"""
     {regclass}      | %L     | '"select"'
     {regclass}      | %s     | "select"
     {regclass,name} | %I     | """select"""
     {regclass,name} | %L     | '"select"'
     {regclass,name} | %s     | "select"

My assumption is that they both represent valid SQL identifiers, so it stands
to reason that `%I` should result in a valid identifier for both of them (or
neither one).

​All three of the %I results are valid identifiers.

regclass performs the same conversion that %I performs.  But since the output of the regclass conversion is a valid identifier, with double-quotes, the %I adds another pair of double-quotes and doubles-up the existing pair thus leaving you with 6.

<select> is a reserved word and thus can only be used as an identifier if it is surrounded in double-quotes.  name() doesn't care (not that it is user-documented that I can find) about making its value usable as an identifier so when its output goes through %I you get the expected value.

​If you are going to use regclass you want to use %s to insert the result into your string; not %I​.

David J.

Re: regclass and format('%I')

От
Jason Dusek
Дата:
It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.

On 13 March 2015 at 12:42, David G. Johnston <david.g.johnston@gmail.com> wrote:
> On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek <jason.dusek@gmail.com> wrote:
>>
>> Hi All,
>>
>> The difference in how format handles `regclass` and `name` seems like an
>> inconsistency:
>>
>>     WITH conversions(casts, format, result) AS (
>>     VALUES (ARRAY['name']::regtype[],             '%I', format('%I',
>> name('select'))),
>>            (ARRAY['name']::regtype[],             '%L', format('%L',
>> name('select'))),
>>            (ARRAY['name']::regtype[],             '%s', format('%s',
>> name('select'))),
>>            (ARRAY['regclass']::regtype[],         '%I', format('%I',
>> regclass('select'))),
>>            (ARRAY['regclass']::regtype[],         '%L', format('%L',
>> regclass('select'))),
>>            (ARRAY['regclass']::regtype[],         '%s', format('%s',
>> regclass('select'))),
>>            (ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
>> name(regclass('select')))),
>>            (ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
>> name(regclass('select')))),
>>            (ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
>> name(regclass('select'))))
>>     ) SELECT * FROM conversions;
>>           casts      | format |    result
>>     -----------------+--------+--------------
>>      {name}          | %I     | "select"
>>      {name}          | %L     | 'select'
>>      {name}          | %s     | select
>>      {regclass}      | %I     | """select"""
>>      {regclass}      | %L     | '"select"'
>>      {regclass}      | %s     | "select"
>>      {regclass,name} | %I     | """select"""
>>      {regclass,name} | %L     | '"select"'
>>      {regclass,name} | %s     | "select"
>>
>> My assumption is that they both represent valid SQL identifiers, so it
>> stands
>> to reason that `%I` should result in a valid identifier for both of them
>> (or
>> neither one).
>
>
> All three of the %I results are valid identifiers.
>
> regclass performs the same conversion that %I performs.  But since the
> output of the regclass conversion is a valid identifier, with double-quotes,
> the %I adds another pair of double-quotes and doubles-up the existing pair
> thus leaving you with 6.
>
> <select> is a reserved word and thus can only be used as an identifier if it
> is surrounded in double-quotes.  name() doesn't care (not that it is
> user-documented that I can find) about making its value usable as an
> identifier so when its output goes through %I you get the expected value.
>
> If you are going to use regclass you want to use %s to insert the result
> into your string; not %I.
>
> David J.
>


Re: regclass and format('%I')

От
Pavel Stehule
Дата:


2015-03-14 10:09 GMT+01:00 Jason Dusek <jason.dusek@gmail.com>:
It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.

yes, it is true, when you use a safe type: regclass, regtype, you should not to use %I due double quoting.

postgres=# select 16398::regclass ;
-[ RECORD 1 ]-------
regclass | "omega a"

postgres=# select format('>>%I<<<', 16398::regclass );
-[ RECORD 1 ]--------------
format | >>"""omega a"""<<<

Should be fixed

Regards

Pavel

Regards

Pavel



On 13 March 2015 at 12:42, David G. Johnston <david.g.johnston@gmail.com> wrote:
> On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek <jason.dusek@gmail.com> wrote:
>>
>> Hi All,
>>
>> The difference in how format handles `regclass` and `name` seems like an
>> inconsistency:
>>
>>     WITH conversions(casts, format, result) AS (
>>     VALUES (ARRAY['name']::regtype[],             '%I', format('%I',
>> name('select'))),
>>            (ARRAY['name']::regtype[],             '%L', format('%L',
>> name('select'))),
>>            (ARRAY['name']::regtype[],             '%s', format('%s',
>> name('select'))),
>>            (ARRAY['regclass']::regtype[],         '%I', format('%I',
>> regclass('select'))),
>>            (ARRAY['regclass']::regtype[],         '%L', format('%L',
>> regclass('select'))),
>>            (ARRAY['regclass']::regtype[],         '%s', format('%s',
>> regclass('select'))),
>>            (ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
>> name(regclass('select')))),
>>            (ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
>> name(regclass('select')))),
>>            (ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
>> name(regclass('select'))))
>>     ) SELECT * FROM conversions;
>>           casts      | format |    result
>>     -----------------+--------+--------------
>>      {name}          | %I     | "select"
>>      {name}          | %L     | 'select'
>>      {name}          | %s     | select
>>      {regclass}      | %I     | """select"""
>>      {regclass}      | %L     | '"select"'
>>      {regclass}      | %s     | "select"
>>      {regclass,name} | %I     | """select"""
>>      {regclass,name} | %L     | '"select"'
>>      {regclass,name} | %s     | "select"
>>
>> My assumption is that they both represent valid SQL identifiers, so it
>> stands
>> to reason that `%I` should result in a valid identifier for both of them
>> (or
>> neither one).
>
>
> All three of the %I results are valid identifiers.
>
> regclass performs the same conversion that %I performs.  But since the
> output of the regclass conversion is a valid identifier, with double-quotes,
> the %I adds another pair of double-quotes and doubles-up the existing pair
> thus leaving you with 6.
>
> <select> is a reserved word and thus can only be used as an identifier if it
> is surrounded in double-quotes.  name() doesn't care (not that it is
> user-documented that I can find) about making its value usable as an
> identifier so when its output goes through %I you get the expected value.
>
> If you are going to use regclass you want to use %s to insert the result
> into your string; not %I.
>
> David J.
>


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: regclass and format('%I')

От
Tom Lane
Дата:
Jason Dusek <jason.dusek@gmail.com> writes:
> It honestly seems far more reasonable to me that %s and %I should do
> the exact same thing with regclass.

You're mistaken.  The operation of format() is first to convert the
non-format arguments to text strings, using the output functions for their
data types, and then to further process those text strings according to
the format specifiers:

%s -- no additional processing, just insert the string as-is.
%I -- apply double-quoting transformation to create a valid SQL identifier.
%L -- apply single-quoting transformation to create a valid SQL literal.

In the case of regclass, the output string is already double-quoted
as necessary, so applying %I to it produces a doubly double-quoted
string which is almost certainly not what you want.  But it's not
format()'s job to be smarter than the user.  If it tried to avoid
an extra pass of double quoting, it would get some cases wrong,
potentially creating security holes.

            regards, tom lane


Re: regclass and format('%I')

От
"David G. Johnston"
Дата:
On Saturday, March 14, 2015, Jason Dusek <jason.dusek@gmail.com> wrote:
It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.


I agree with the theory but adding type specific logic to format is going to be difficult.  The first thing the system does is convert all of the inputs to text.  Inside format() everything is text and so it has no way to know that the type was regclass and should not be quoted again.

David J.

Re: regclass and format('%I')

От
Jason Dusek
Дата:
On 14 March 2015 at 09:17, David G. Johnston <david.g.johnston@gmail.com> wrote:
> On Saturday, March 14, 2015, Jason Dusek <jason.dusek@gmail.com> wrote:
>> It honestly seems far more reasonable to me that %s and %I should do
>> the exact same thing with regclass. My reasoning is as follows:
>>
>> ‘%I’ formats a something such that it is a valid identifier,
>>
>> regclass is already a valid identifier,
>>
>> therefore, do nothing.
>>
>> Another line of reasoning:
>>
>> If you format with ‘%s’ you are saying: I don’t care whether it’s a
>> valid identifier or literal or whatever, just put the string there,
>>
>> but when we sub a regclass into a string, we want it to be a valid
>> identifier,
>>
>> therefore we should write ‘%I’ when subbing it, so as not to confuse
>> our readers,
>>
>> therefore ‘%I’ should do nothing.
>>
>
> I agree with the theory but adding type specific logic to format is going to
> be difficult.  The first thing the system does is convert all of the inputs
> to text.  Inside format() everything is text and so it has no way to know
> that the type was regclass and should not be quoted again.

Could it work to add type-specific logic for the cast from `regclass`
to `name`? It would be nice to have something formulaic: always format
identifiers with `%I`, always cast to `name` before formatting.

Kind Regards,
  Jason Dusek


Re: regclass and format('%I')

От
Pavel Stehule
Дата:


2015-03-15 3:09 GMT+01:00 Jason Dusek <jason.dusek@gmail.com>:
On 14 March 2015 at 09:17, David G. Johnston <david.g.johnston@gmail.com> wrote:
> On Saturday, March 14, 2015, Jason Dusek <jason.dusek@gmail.com> wrote:
>> It honestly seems far more reasonable to me that %s and %I should do
>> the exact same thing with regclass. My reasoning is as follows:
>>
>> ‘%I’ formats a something such that it is a valid identifier,
>>
>> regclass is already a valid identifier,
>>
>> therefore, do nothing.
>>
>> Another line of reasoning:
>>
>> If you format with ‘%s’ you are saying: I don’t care whether it’s a
>> valid identifier or literal or whatever, just put the string there,
>>
>> but when we sub a regclass into a string, we want it to be a valid
>> identifier,
>>
>> therefore we should write ‘%I’ when subbing it, so as not to confuse
>> our readers,
>>
>> therefore ‘%I’ should do nothing.
>>
>
> I agree with the theory but adding type specific logic to format is going to
> be difficult.  The first thing the system does is convert all of the inputs
> to text.  Inside format() everything is text and so it has no way to know
> that the type was regclass and should not be quoted again.

Could it work to add type-specific logic for the cast from `regclass`
to `name`? It would be nice to have something formulaic: always format
identifiers with `%I`, always cast to `name` before formatting.

I don't think, so it can help - first, it is workaround and it doesn't help for somebody who doesn't read a documentation. Same effect you can get if you write "doesn't use %I for regclass, regtype types", although this sentence is strange. I agree with you so it is bug or minimally not user friendly design.

A some good solution should be safe function quote_identif, that protect us against double quoting. This logic should be elsewhere than inside "format" function. I am thinking so we can do it. It breaks nothing. Implementation should not be too much complex, because "new" function quote_identif can do nothing for safe types, so it can take string as input parameter and typid as second parameter.

Pavel
 

Kind Regards,
  Jason Dusek


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: regclass and format('%I')

От
"David G. Johnston"
Дата:
On Sat, Mar 14, 2015 at 8:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jason Dusek <jason.dusek@gmail.com> writes:
> It honestly seems far more reasonable to me that %s and %I should do
> the exact same thing with regclass.

You're mistaken.  The operation of format() is first to convert the
non-format arguments to text strings, using the output functions for their
data types, and then to further process those text strings according to
the format specifiers:

%s -- no additional processing, just insert the string as-is.
%I -- apply double-quoting transformation to create a valid SQL identifier.
%L -- apply single-quoting transformation to create a valid SQL literal.

In the case of regclass, the output string is already double-quoted
as necessary, so applying %I to it produces a doubly double-quoted
string which is almost certainly not what you want.  But it's not
format()'s job to be smarter than the user.  If it tried to avoid
an extra pass of double quoting, it would get some cases wrong,
potentially creating security holes.



TBH ​I'm not all that convinced by this argument​.

First, it is not being smarter than the user but allowing the user to generalize their problem so that they do not need to take the nature of the input data into account and can write a semantically meaningful pattern string instead.  The risk of them incorrectly choosing between %s or %I and opening a security hole seems higher - if not as widespread - than any string logic we could apply.

Second, presupposing the the transformation of the input must be a single "thing", and that we are doing the %I conversion based upon our own internal (or SQL's at the matter may be) definition of what it means to "quote an identifier", we should be capable of noticing that the provided input is already a single "thing" which has been escaped according to said rules.

​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how it is possible ​for format to lay in a value at %I that is any more insecure than the current behavior.  If the input string already matches that pattern then it could be output as-is without any additional risk and with the positive benefit of making this case work as expected.  The broken case then exists when someone actually intends to name their identifier <"something"> which then correctly becomes <"""something"""> on output.

Since there is a behavior change involved there needs to be a convincing use-case for the new behavior in order to justify the effort to change it.

David J.

Re: regclass and format('%I')

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> ​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how
> it is possible ​for format to lay in a value at %I that is any more
> insecure than the current behavior.  If the input string already matches
> that pattern then it could be output as-is without any additional risk and
> with the positive benefit of making this case work as expected.  The broken
> case then exists when someone actually intends to name their identifier
> <"something"> which then correctly becomes <"""something"""> on output.

But that's exactly the problem: you just broke a case that used to work.
format('%I') is not supposed to guess at what the user intends; it is
supposed to produce a string that, after being passed through identifier
parsing (dequoting or downcasing), will match the input.  It is not
format's business to break that contract just because the input has
already got some double quotes in it.

An example of where this might be important is if you're trying to
construct a query with arbitrary column headers in the output.  You
can do
    format('... AS %I ...', ..., column_label, ...)
and be confident that the label will be exactly what you've got in
column_label.  This proposed change would break that for labels that
happen to already have double-quotes --- but who are we to say that
that can't have been what you wanted?

            regards, tom lane


Re: regclass and format('%I')

От
"David G. Johnston"
Дата:
On Sunday, March 15, 2015, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> ​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how
> it is possible ​for format to lay in a value at %I that is any more
> insecure than the current behavior.  If the input string already matches
> that pattern then it could be output as-is without any additional risk and
> with the positive benefit of making this case work as expected.  The broken
> case then exists when someone actually intends to name their identifier
> <"something"> which then correctly becomes <"""something"""> on output.

But that's exactly the problem: you just broke a case that used to work.
format('%I') is not supposed to guess at what the user intends; it is
supposed to produce a string that, after being passed through identifier
parsing (dequoting or downcasing), will match the input.  It is not
format's business to break that contract just because the input has
already got some double quotes in it.

An example of where this might be important is if you're trying to
construct a query with arbitrary column headers in the output.  You
can do
        format('... AS %I ...', ..., column_label, ...)
and be confident that the label will be exactly what you've got in
column_label.  This proposed change would break that for labels that
happen to already have double-quotes --- but who are we to say that
that can't have been what you wanted?

                        regards, tom lane

Ok, but that doesn't impact security.

Contracts can be amended to be more practical and what is being suggested is not as radical as you make it out to be.  I'm with Jason and dislike the "use %s" option and having a behavior of "quote if necessary" is not unreasonable.  Maybe it needs to be a different code for compatibility reasons.  In hindsight I'd make %Q the current unconditional quoting behavior and use %I for conditional quoting but using %Q for that now would at least make the behavior available.

David J.

 

Re: regclass and format('%I')

От
Jason Dusek
Дата:
On 15 March 2015 at 08:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> ​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how
> it is possible ​for format to lay in a value at %I that is any more
> insecure than the current behavior.  If the input string already matches
> that pattern then it could be output as-is without any additional risk and
> with the positive benefit of making this case work as expected.  The broken
> case then exists when someone actually intends to name their identifier
> <"something"> which then correctly becomes <"""something"""> on output.

But that's exactly the problem: you just broke a case that used to work.
format('%I') is not supposed to guess at what the user intends; it is
supposed to produce a string that, after being passed through identifier
parsing (dequoting or downcasing), will match the input.  It is not
format's business to break that contract just because the input has
already got some double quotes in it.

An example of where this might be important is if you're trying to
construct a query with arbitrary column headers in the output.  You
can do
        format('... AS %I ...', ..., column_label, ...)
and be confident that the label will be exactly what you've got in
column_label.  This proposed change would break that for labels that
happen to already have double-quotes --- but who are we to say that
that can't have been what you wanted?

I agree with Tom that we shouldn't key off of contents in the string to determine whether or not to quote. Introducing the behave I describe in an intuitive way would require some kind of type-specific handling in format(). I'm not sure what the cost of this is to the project, but David makes the very reasonable point that imposing the burden of choosing between `%s` and `%I` opens up the possibility of confusing vulnerabilities.

Kind Regards,
  Jason Dusek