Обсуждение: [HACKERS] Incautious handling of overlength identifiers
Pursuant to the report here: https://www.postgresql.org/message-id/7d0809ee-6f25-c9d6-8e74-5b2967830d49@enterprisedb.com I tried to test all the built-in functions that take "text" (rather than "name") arguments representing cataloged objects. I was able to provoke the same assertion failure in hashname() from these: regression=# select has_column_privilege('postgres','tenk1',repeat('z', 200),repeat('q', 200)); server closed the connection unexpectedly regression=# select has_language_privilege('postgres',repeat('y', 200),repeat('z', 200)); server closed the connection unexpectedly regression=# select has_schema_privilege('postgres',repeat('y', 200),repeat('z', 200)); server closed the connection unexpectedly regression=# select has_foreign_data_wrapper_privilege('postgres',repeat('y', 200),repeat('z', 200)); server closed the connection unexpectedly regression=# select has_server_privilege('postgres',repeat('y', 200),repeat('z', 200)); server closed the connection unexpectedly regression=# select pg_get_serial_sequence('tenk1',repeat('x', 200)); server closed the connection unexpectedly regression=# select pg_get_object_address('table',array[repeat('x', 200)],array[repeat('y', 200)]); server closed the connection unexpectedly (Probably many of the code paths in pg_get_object_address can be made to crash like this, but I stopped after finding one.) Fortunately, a non-assert build will not crash like this, it'll just produce a name-not-found failure. Quite a lot of other functions that didn't crash produced errors suggesting that they aren't truncating their inputs to NAMEDATALEN before doing the lookup. I think this is not expected behavior either, since direct SQL references to such objects would always be truncated. Of the functions that did truncate, there was a remarkable lack of consistency about whether they produced NOTICEs warning of the truncation. I'm not as concerned about that, but I wonder whether we ought to have a uniform policy about it. So what to do? We could run around and fix these individual cases and call it good, but if we do, I will bet a very fine dinner that more such errors will sneak in before long. Seems like we need a coding convention that discourages just randomly treating a C string as a valid value of type NAME. Not sure how to get there though. BTW, I also notice that parse_ident() doesn't truncate the identifiers it parses. ISTM this is a bad thing; isn't more or less the whole point of that function to convert a string to a name as the Postgres parser would do? Comments? regards, tom lane
I wrote: > So what to do? We could run around and fix these individual cases > and call it good, but if we do, I will bet a very fine dinner that > more such errors will sneak in before long. Seems like we need a > coding convention that discourages just randomly treating a C string > as a valid value of type NAME. Not sure how to get there though. An alternative worth considering, especially for the back branches, is simply to remove the Assert in hashname(). That would give us the behavior that non-developers see anyway, which is that these functions always fail to match overlength names, whether or not the names would have matched after truncation. Trying to apply truncation more consistently could be left as an improvement project for later. regards, tom lane
On 12/23/2016 12:44 PM, Tom Lane wrote: > I wrote: >> So what to do? We could run around and fix these individual cases >> and call it good, but if we do, I will bet a very fine dinner that >> more such errors will sneak in before long. Seems like we need a >> coding convention that discourages just randomly treating a C string >> as a valid value of type NAME. Not sure how to get there though. > > An alternative worth considering, especially for the back branches, > is simply to remove the Assert in hashname(). That would give us > the behavior that non-developers see anyway, which is that these > functions always fail to match overlength names, whether or not > the names would have matched after truncation. Trying to apply > truncation more consistently could be left as an improvement > project for later. That sounds reasonable to me. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
On Sat, Dec 24, 2016 at 7:44 AM, Joe Conway <mail@joeconway.com> wrote: > On 12/23/2016 12:44 PM, Tom Lane wrote: >> I wrote: >>> So what to do? We could run around and fix these individual cases >>> and call it good, but if we do, I will bet a very fine dinner that >>> more such errors will sneak in before long. Seems like we need a >>> coding convention that discourages just randomly treating a C string >>> as a valid value of type NAME. Not sure how to get there though. >> >> An alternative worth considering, especially for the back branches, >> is simply to remove the Assert in hashname(). That would give us >> the behavior that non-developers see anyway, which is that these >> functions always fail to match overlength names, whether or not >> the names would have matched after truncation. Trying to apply >> truncation more consistently could be left as an improvement >> project for later. > > That sounds reasonable to me. +1 for just removing the assertion on back-branches. On HEAD, it seems right to me to keep the assertion. However it is not possible to just switch those routines from text to name as a table could be defined with its schema name. So at minimum this would require adjusting textToQualifiedNameList() & similar routines in charge of putting in shape the name lists. Another idea would be to have as a data type an idea of "qualified name", where the schema and the table names are truncated automatically at 63 characters, and have those catalog use it. This way the parsing and truncation logic are directly part of the input and output functions, and we could assume that the internal representation is a list of names. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Sat, Dec 24, 2016 at 7:44 AM, Joe Conway <mail@joeconway.com> wrote: >> On 12/23/2016 12:44 PM, Tom Lane wrote: >>> An alternative worth considering, especially for the back branches, >>> is simply to remove the Assert in hashname(). That would give us >>> the behavior that non-developers see anyway, which is that these >>> functions always fail to match overlength names, whether or not >>> the names would have matched after truncation. Trying to apply >>> truncation more consistently could be left as an improvement >>> project for later. >> That sounds reasonable to me. > +1 for just removing the assertion on back-branches. On HEAD, it seems > right to me to keep the assertion. However it is not possible to just > switch those routines from text to name as a table could be defined > with its schema name. So at minimum this would require adjusting > textToQualifiedNameList() & similar routines in charge of putting in > shape the name lists. textToQualifiedNameList() already does truncate, and I suspect everyplace that deals in qualified names does as well. The problem places are those where someone just took a text parameter, did text_to_cstring on it, and started treating the result as a Name (e.g. by passing it to a catalog lookup function). Since we've intentionally allowed C strings to be used as Names in most places, it's not even immediately obvious that this is wrong. I'm currently inclined to think that removing the assertion from hashname() is right even for HEAD, because it's not very relevant to the operation of that routine (you'll get back some hash value even for overlength strings), and because it's a pretty useless way of enforcing truncation. It basically will only catch you if you try to do a syscache lookup to resolve a name --- if you do a catalog heap or index scan, those paths contain no such gotcha. And I'm disinclined to introduce one. The real problem with trying to enforce this through length assertions in low-level routines is that they'll only reveal a bug if you actually happen to test the appropriate calling code path with an overlength name. We've obviously failed to do that in the past and I have little faith that we'd do it in the future. So that's why I was thinking about whether we could do this through some datatype-based approach, whereby we could hope to catch incorrect coding reliably through compiler checks. But given our history of allowing C strings as names, I'm afraid that any such change would be enormously invasive and not worth the trouble. Another idea worth considering is to just make the low-level functions do truncation, ie the fix in hashname would look more like - Assert(keylen < NAMEDATALEN); + if (keylen >= NAMEDATALEN) + keylen = pg_mbcliplen(key, keylen, NAMEDATALEN - 1); and we'd need something similar in the name comparison functions. But that would be slightly annoying from a performance standpoint. Not so much the extra pg_mbcliplen calls, because we could assume those wouldn't happen in any performance-interesting cases; but there are no strlen calls in the name comparison functions right now, so we'd have to add some, and those would get hit every time through. regards, tom lane
I wrote: > Another idea worth considering is to just make the low-level functions > do truncation ... After further thought, there's a bigger-picture issue here, which is whether the inputs to the SQL functions in question are intended to be raw user input --- in which case, not only would truncation be an appropriate service, but probably so would downcasing --- or whether they are more likely to be coming from a catalog scan, in which case you don't want any of that stuff. Nobody's going to be happy if we start making them add quote_ident() around references to name columns. I think the privilege-inquiry functions are almost certainly mostly used in the latter style; there might be more room for debate about, say, pg_get_serial_sequence. Since the low-level functions need to support both use cases, asking them to handle truncation is wrong, just as much as it would be to ask them to do downcasing. If we take these SQL functions as being meant for use with inputs coming from catalogs, then they don't need to do truncation for user-friendliness purposes; it's perfectly fine to treat overlength inputs as "name not found" cases. So that says we could just remove that Assert and decide we're done. regards, tom lane
Tom Lane wrote: > I wrote: > > Another idea worth considering is to just make the low-level functions > > do truncation ... > > After further thought, there's a bigger-picture issue here, which > is whether the inputs to the SQL functions in question are intended to > be raw user input --- in which case, not only would truncation be > an appropriate service, but probably so would downcasing --- or whether > they are more likely to be coming from a catalog scan, in which case > you don't want any of that stuff. Nobody's going to be happy if we > start making them add quote_ident() around references to name columns. > I think the privilege-inquiry functions are almost certainly mostly > used in the latter style; there might be more room for debate about, > say, pg_get_serial_sequence. I expect that uses of pg_get_object_address() (one of the affected interfaces) would mostly be through an event trigger or a similar internal mechanism, that hopefully should appropriately quote names and not produce anything overlength. At least, pg_identify_object() (which is what I mostly had in mind) complies. I think removing the assert is a good enough solution, myself. I also admit it didn't occur to me to test the function(s) against overlength input when developing it. I wouldn't object to adding code to deal with overlength identifies, but I'm not really sure I would choose truncation over reporting an error. But whatever it'd be, it'd be at that level, not at the lower (hash function) level. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I also admit it didn't occur to me to test the function(s) against > overlength input when developing it. I wouldn't object to adding code > to deal with overlength identifies, but I'm not really sure I would > choose truncation over reporting an error. But whatever it'd be, it'd > be at that level, not at the lower (hash function) level. Yeah, I'm now convinced that whatever we do about this, if we do anything, needs to be at a higher code level. It's not hashname()'s job to prevent use of overlength names. I'll go remove the Assert. regards, tom lane