Обсуждение: [HACKERS] Incautious handling of overlength identifiers

Поиск
Список
Период
Сортировка

[HACKERS] Incautious handling of overlength identifiers

От
Tom Lane
Дата:
Pursuant to the report here:
https://www.postgresql.org/message-id/7d0809ee-6f25-c9d6-8e74-5b2967830d49@enterprisedb.com
I tried to test all the built-in functions that take "text" (rather
than "name") arguments representing cataloged objects.  I was able
to provoke the same assertion failure in hashname() from these:

regression=# select has_column_privilege('postgres','tenk1',repeat('z', 200),repeat('q', 200));
server closed the connection unexpectedly
regression=# select has_language_privilege('postgres',repeat('y', 200),repeat('z', 200));
server closed the connection unexpectedly
regression=# select has_schema_privilege('postgres',repeat('y', 200),repeat('z', 200));
server closed the connection unexpectedly
regression=# select has_foreign_data_wrapper_privilege('postgres',repeat('y', 200),repeat('z', 200));
server closed the connection unexpectedly
regression=# select has_server_privilege('postgres',repeat('y', 200),repeat('z', 200));
server closed the connection unexpectedly
regression=# select pg_get_serial_sequence('tenk1',repeat('x', 200));
server closed the connection unexpectedly
regression=# select pg_get_object_address('table',array[repeat('x', 200)],array[repeat('y', 200)]);
server closed the connection unexpectedly

(Probably many of the code paths in pg_get_object_address can be made to
crash like this, but I stopped after finding one.)

Fortunately, a non-assert build will not crash like this, it'll just
produce a name-not-found failure.

Quite a lot of other functions that didn't crash produced errors
suggesting that they aren't truncating their inputs to NAMEDATALEN
before doing the lookup.  I think this is not expected behavior
either, since direct SQL references to such objects would always
be truncated.

Of the functions that did truncate, there was a remarkable lack
of consistency about whether they produced NOTICEs warning of
the truncation.  I'm not as concerned about that, but I wonder
whether we ought to have a uniform policy about it.

So what to do?  We could run around and fix these individual cases
and call it good, but if we do, I will bet a very fine dinner that
more such errors will sneak in before long.  Seems like we need a
coding convention that discourages just randomly treating a C string
as a valid value of type NAME.  Not sure how to get there though.

BTW, I also notice that parse_ident() doesn't truncate the identifiers
it parses.  ISTM this is a bad thing; isn't more or less the whole
point of that function to convert a string to a name as the Postgres
parser would do?

Comments?
        regards, tom lane



Re: [HACKERS] Incautious handling of overlength identifiers

От
Tom Lane
Дата:
I wrote:
> So what to do?  We could run around and fix these individual cases
> and call it good, but if we do, I will bet a very fine dinner that
> more such errors will sneak in before long.  Seems like we need a
> coding convention that discourages just randomly treating a C string
> as a valid value of type NAME.  Not sure how to get there though.

An alternative worth considering, especially for the back branches,
is simply to remove the Assert in hashname().  That would give us
the behavior that non-developers see anyway, which is that these
functions always fail to match overlength names, whether or not
the names would have matched after truncation.  Trying to apply
truncation more consistently could be left as an improvement
project for later.
        regards, tom lane



Re: [HACKERS] Incautious handling of overlength identifiers

От
Joe Conway
Дата:
On 12/23/2016 12:44 PM, Tom Lane wrote:
> I wrote:
>> So what to do?  We could run around and fix these individual cases
>> and call it good, but if we do, I will bet a very fine dinner that
>> more such errors will sneak in before long.  Seems like we need a
>> coding convention that discourages just randomly treating a C string
>> as a valid value of type NAME.  Not sure how to get there though.
>
> An alternative worth considering, especially for the back branches,
> is simply to remove the Assert in hashname().  That would give us
> the behavior that non-developers see anyway, which is that these
> functions always fail to match overlength names, whether or not
> the names would have matched after truncation.  Trying to apply
> truncation more consistently could be left as an improvement
> project for later.

That sounds reasonable to me.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: [HACKERS] Incautious handling of overlength identifiers

От
Michael Paquier
Дата:
On Sat, Dec 24, 2016 at 7:44 AM, Joe Conway <mail@joeconway.com> wrote:
> On 12/23/2016 12:44 PM, Tom Lane wrote:
>> I wrote:
>>> So what to do?  We could run around and fix these individual cases
>>> and call it good, but if we do, I will bet a very fine dinner that
>>> more such errors will sneak in before long.  Seems like we need a
>>> coding convention that discourages just randomly treating a C string
>>> as a valid value of type NAME.  Not sure how to get there though.
>>
>> An alternative worth considering, especially for the back branches,
>> is simply to remove the Assert in hashname().  That would give us
>> the behavior that non-developers see anyway, which is that these
>> functions always fail to match overlength names, whether or not
>> the names would have matched after truncation.  Trying to apply
>> truncation more consistently could be left as an improvement
>> project for later.
>
> That sounds reasonable to me.

+1 for just removing the assertion on back-branches. On HEAD, it seems
right to me to keep the assertion. However it is not possible to just
switch those routines from text to name as a table could be defined
with its schema name. So at minimum this would require adjusting
textToQualifiedNameList() & similar routines in charge of putting in
shape the name lists. Another idea would be to have as a data type an
idea of "qualified name", where the schema and the table names are
truncated automatically at 63 characters, and have those catalog use
it. This way the parsing and truncation logic are directly part of the
input and output functions, and we could assume that the internal
representation is a list of names.
-- 
Michael



Re: [HACKERS] Incautious handling of overlength identifiers

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Sat, Dec 24, 2016 at 7:44 AM, Joe Conway <mail@joeconway.com> wrote:
>> On 12/23/2016 12:44 PM, Tom Lane wrote:
>>> An alternative worth considering, especially for the back branches,
>>> is simply to remove the Assert in hashname().  That would give us
>>> the behavior that non-developers see anyway, which is that these
>>> functions always fail to match overlength names, whether or not
>>> the names would have matched after truncation.  Trying to apply
>>> truncation more consistently could be left as an improvement
>>> project for later.

>> That sounds reasonable to me.

> +1 for just removing the assertion on back-branches. On HEAD, it seems
> right to me to keep the assertion. However it is not possible to just
> switch those routines from text to name as a table could be defined
> with its schema name. So at minimum this would require adjusting
> textToQualifiedNameList() & similar routines in charge of putting in
> shape the name lists.

textToQualifiedNameList() already does truncate, and I suspect everyplace
that deals in qualified names does as well.  The problem places are those
where someone just took a text parameter, did text_to_cstring on it, and
started treating the result as a Name (e.g. by passing it to a catalog
lookup function).  Since we've intentionally allowed C strings to be used
as Names in most places, it's not even immediately obvious that this is
wrong.

I'm currently inclined to think that removing the assertion from hashname()
is right even for HEAD, because it's not very relevant to the operation
of that routine (you'll get back some hash value even for overlength
strings), and because it's a pretty useless way of enforcing truncation.
It basically will only catch you if you try to do a syscache lookup to
resolve a name --- if you do a catalog heap or index scan, those paths
contain no such gotcha.  And I'm disinclined to introduce one.

The real problem with trying to enforce this through length assertions
in low-level routines is that they'll only reveal a bug if you actually
happen to test the appropriate calling code path with an overlength name.
We've obviously failed to do that in the past and I have little faith that
we'd do it in the future.

So that's why I was thinking about whether we could do this through some
datatype-based approach, whereby we could hope to catch incorrect coding
reliably through compiler checks.  But given our history of allowing C
strings as names, I'm afraid that any such change would be enormously
invasive and not worth the trouble.

Another idea worth considering is to just make the low-level functions
do truncation, ie the fix in hashname would look more like

-    Assert(keylen < NAMEDATALEN);
+    if (keylen >= NAMEDATALEN)
+        keylen = pg_mbcliplen(key, keylen, NAMEDATALEN - 1);

and we'd need something similar in the name comparison functions.
But that would be slightly annoying from a performance standpoint.
Not so much the extra pg_mbcliplen calls, because we could assume
those wouldn't happen in any performance-interesting cases; but
there are no strlen calls in the name comparison functions right now,
so we'd have to add some, and those would get hit every time through.
        regards, tom lane



Re: [HACKERS] Incautious handling of overlength identifiers

От
Tom Lane
Дата:
I wrote:
> Another idea worth considering is to just make the low-level functions
> do truncation ...

After further thought, there's a bigger-picture issue here, which
is whether the inputs to the SQL functions in question are intended to
be raw user input --- in which case, not only would truncation be
an appropriate service, but probably so would downcasing --- or whether
they are more likely to be coming from a catalog scan, in which case
you don't want any of that stuff.  Nobody's going to be happy if we
start making them add quote_ident() around references to name columns.
I think the privilege-inquiry functions are almost certainly mostly
used in the latter style; there might be more room for debate about,
say, pg_get_serial_sequence.

Since the low-level functions need to support both use cases, asking
them to handle truncation is wrong, just as much as it would be to
ask them to do downcasing.

If we take these SQL functions as being meant for use with inputs
coming from catalogs, then they don't need to do truncation for
user-friendliness purposes; it's perfectly fine to treat overlength
inputs as "name not found" cases.  So that says we could just remove
that Assert and decide we're done.
        regards, tom lane



Re: [HACKERS] Incautious handling of overlength identifiers

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> I wrote:
> > Another idea worth considering is to just make the low-level functions
> > do truncation ...
> 
> After further thought, there's a bigger-picture issue here, which
> is whether the inputs to the SQL functions in question are intended to
> be raw user input --- in which case, not only would truncation be
> an appropriate service, but probably so would downcasing --- or whether
> they are more likely to be coming from a catalog scan, in which case
> you don't want any of that stuff.  Nobody's going to be happy if we
> start making them add quote_ident() around references to name columns.
> I think the privilege-inquiry functions are almost certainly mostly
> used in the latter style; there might be more room for debate about,
> say, pg_get_serial_sequence.

I expect that uses of pg_get_object_address() (one of the affected
interfaces) would mostly be through an event trigger or a similar
internal mechanism, that hopefully should appropriately quote names and
not produce anything overlength.  At least, pg_identify_object() (which
is what I mostly had in mind) complies.  I think removing the assert is
a good enough solution, myself.

I also admit it didn't occur to me to test the function(s) against
overlength input when developing it.  I wouldn't object to adding code
to deal with overlength identifies, but I'm not really sure I would
choose truncation over reporting an error.  But whatever it'd be, it'd
be at that level, not at the lower (hash function) level.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Incautious handling of overlength identifiers

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I also admit it didn't occur to me to test the function(s) against
> overlength input when developing it.  I wouldn't object to adding code
> to deal with overlength identifies, but I'm not really sure I would
> choose truncation over reporting an error.  But whatever it'd be, it'd
> be at that level, not at the lower (hash function) level.

Yeah, I'm now convinced that whatever we do about this, if we do anything,
needs to be at a higher code level.  It's not hashname()'s job to prevent
use of overlength names.  I'll go remove the Assert.
        regards, tom lane