Обсуждение: Quoting fun
I've been looking at SQL99 while reviewing a book, and stumbled across some (new to me) behavior for double-quoted identifiers. The SQL99 way to embed a double-quote into a quoted identifier is to put in two adjacent double-quotes (much like is done for embedding single-quotes into string literals in SQL9x). I've modified (but not yet committed) the pieces in parser (scan.l) and pg_dump (common.c) to accept and emit appropriate stuff for the embedded double-quote case. An example is create table "hi""there" (i int); \d List of relations Name | Type | Owner----------|-------|---------- hi"there| table | lockhart(1 row) Currently, pg_dump escapes this by embedding a backslash/double-quote pair, and I'm proposing that it emit two adjacent double-quotes instead (btw, scan.l does not seem to accept this for input at the moment ;). Any objections to committing this? Are there other cases which must be considered? On another somewhat related point: String literals can contain escaped characters, which postgres removes early in the parsing stage. These escapes are re-inserted *every time the string is returned in a query*. imho this is the wrong behavior, since the escapes were present in the input only to get around SQL9x syntax for allowable input characters (or to get around some postgres oddity). This is not an issue when sending strings back out from the server, except for perhaps the special case of the null character. And it's pretty silly to enter escaped strings, remove the escapes, then re-escape them for all user-visible interactions with the string. Except for perhaps string comparisons, there is hardly any point in bothering to unescape the string internally! I propose that we move the responsibility for re-escaping literal strings to pg_dump, which is the only utility with the charter to generate strings which are fully symmetric with input strings. We can provide libpq with an "escape routine" to assist other apps with this task if they need it. Comments? - Thomas
> On another somewhat related point: Hmm. In looking at pg_dump it seems that escaping string literals is already done there. So never mind on this second point... - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > I've been looking at SQL99 while reviewing a book, and stumbled across > some (new to me) behavior for double-quoted identifiers. The SQL99 way > to embed a double-quote into a quoted identifier is to put in two > adjacent double-quotes (much like is done for embedding single-quotes > into string literals in SQL9x). I looked at doing that a while ago, not because I knew it was in SQL99 but just because it seemed like a nice idea. I backed off though when I realized that there are a *lot* of places that will break. scan.l and pg_dump are just the tip of the iceberg --- there are many other places, and probably lots of applications, that assume printing "%s" is sufficient to protect an identifier. Be prepared for a lot of mop-up work if you want to press forward with this. > Currently, pg_dump escapes this by embedding a backslash/double-quote > pair, pg_dump is mistaken --- as you say, the backend doesn't accept backslashes in doublequoted idents. (Since there is no way to get a doublequote into an ident currently, pg_dump's check is dead code, which is why no one noticed it was broken.) > String literals can contain escaped characters, which postgres removes > early in the parsing stage. These escapes are re-inserted *every time > the string is returned in a query*. Au contraire, the backend never re-inserts escapes. regards, tom lane