Обсуждение: Quoting fun

Поиск
Список
Период
Сортировка

Quoting fun

От
Thomas Lockhart
Дата:
I've been looking at SQL99 while reviewing a book, and stumbled across
some (new to me) behavior for double-quoted identifiers. The SQL99 way
to embed a double-quote into a quoted identifier is to put in two
adjacent double-quotes (much like is done for embedding single-quotes
into string literals in SQL9x). 

I've modified (but not yet committed) the pieces in parser (scan.l) and
pg_dump (common.c) to accept and emit appropriate stuff for the embedded
double-quote case. An example is
 create table "hi""there" (i int); \d      List of relations   Name   | Type  |  Owner----------|-------|----------
hi"there| table | lockhart(1 row)
 

Currently, pg_dump escapes this by embedding a backslash/double-quote
pair, and I'm proposing that it emit two adjacent double-quotes instead
(btw, scan.l does not seem to accept this for input at the moment ;).
Any objections to committing this? Are there other cases which must be
considered?

On another somewhat related point:

String literals can contain escaped characters, which postgres removes
early in the parsing stage. These escapes are re-inserted *every time
the string is returned in a query*. imho this is the wrong behavior,
since the escapes were present in the input only to get around SQL9x
syntax for allowable input characters (or to get around some postgres
oddity). This is not an issue when sending strings back out from the
server, except for perhaps the special case of the null character. And
it's pretty silly to enter escaped strings, remove the escapes, then
re-escape them for all user-visible interactions with the string. Except
for perhaps string comparisons, there is hardly any point in bothering
to unescape the string internally!

I propose that we move the responsibility for re-escaping literal
strings to pg_dump, which is the only utility with the charter to
generate strings which are fully symmetric with input strings. We can
provide libpq with an "escape routine" to assist other apps with this
task if they need it.

Comments?
                   - Thomas


Re: Quoting fun

От
Thomas Lockhart
Дата:
> On another somewhat related point:

Hmm. In looking at pg_dump it seems that escaping string literals is
already done there. So never mind on this second point...
                    - Thomas


Re: Quoting fun

От
Tom Lane
Дата:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> I've been looking at SQL99 while reviewing a book, and stumbled across
> some (new to me) behavior for double-quoted identifiers. The SQL99 way
> to embed a double-quote into a quoted identifier is to put in two
> adjacent double-quotes (much like is done for embedding single-quotes
> into string literals in SQL9x). 

I looked at doing that a while ago, not because I knew it was in SQL99
but just because it seemed like a nice idea.  I backed off though when
I realized that there are a *lot* of places that will break.  scan.l
and pg_dump are just the tip of the iceberg --- there are many other
places, and probably lots of applications, that assume printing "%s"
is sufficient to protect an identifier.  Be prepared for a lot of
mop-up work if you want to press forward with this.

> Currently, pg_dump escapes this by embedding a backslash/double-quote
> pair,

pg_dump is mistaken --- as you say, the backend doesn't accept
backslashes in doublequoted idents.  (Since there is no way to get a
doublequote into an ident currently, pg_dump's check is dead code,
which is why no one noticed it was broken.)

> String literals can contain escaped characters, which postgres removes
> early in the parsing stage. These escapes are re-inserted *every time
> the string is returned in a query*.

Au contraire, the backend never re-inserts escapes.
        regards, tom lane