Обсуждение: Reserved words and delimited identifiers
Hi, A few months ago, I got an email related to Pyrseas (http://lists.pgfoundry.org/pipermail/pyrseas-general/2011-August/000003.html) where the user reported he had a table named "user" and reported a failure in the dbtoyaml utility. I eventually implemented a simple quote_id function (only checks if an identifier needs quoting based on the characters in it, not on whether it's a keyword or not). I left adding a list of keywords as a "to do." Filip, the user, recommended that Pyrseas "quote all identifiers sent to database." I was reluctant to go with the latter solution, but after two other issues (Tryton database had a column named "limit" and another user a table named "order"), I started making the changes. However, I've found some PostgreSQL behavior that is inconsistent, at the very least with respect to the documentation. It's probably best shown by example (tested under 8.4 and 9.1): pyrseas_testdb=# CREATE FUNCTION "f1"("integer", "integer") RETURNS "integer" LANGUAGE "sql" AS $_$SELECT GREATEST($1, $2)$_$; ERROR: type integer does not exist pyrseas_testdb=# CREATE TABLE "t1" ("c1" "integer", "c2" "text"); ERROR: type "integer" does not exist LINE 1: CREATE TABLE "t1" ("c1" "integer", "c2" "text"); ^ pyrseas_testdb=# CREATE TABLE "t1" ("c1" integer, "c2" "text"); CREATE TABLE pyrseas_testdb=# CREATE DOMAIN "d1" AS integer; CREATE DOMAIN pyrseas_testdb=# CREATE TABLE "t2" ("c1" "d1", "c2" "text"); CREATE TABLE pyrseas_testdb=# CREATE TABLE "t3" ("c1" "int4", "c2" "text"); CREATE TABLE pyrseas_testdb=# CREATE TABLE "t4" ("c1" "int", "c2" "text"); ERROR: type "int" does not exist LINE 1: CREATE TABLE "t4" ("c1" "int", "c2" "text"); ^ pyrseas_testdb=# CREATE TABLE "t5" ("c1" "INTEGER", "c2" "text"); ERROR: type "INTEGER" does not exist LINE 1: CREATE TABLE "t5" ("c1" "INTEGER", "c2" "text"); ^ pyrseas_testdb=# CREATE DOMAIN "float" AS real; CREATE DOMAIN pyrseas_testdb=# CREATE TABLE "t6" ("c1" "float", "c2" "text"); CREATE TABLE pyrseas_testdb=# DROP TABLE "t6"; pyrseas_testdb=# DROP DOMAIN "float"; DROP DOMAIN pyrseas_testdb=# CREATE TABLE "t6" ("c1" "float", "c2" "text"); ERROR: type "float" does not exist LINE 1: CREATE TABLE "t6" ("c1" "float", "c2" "text"); ^ The last part is a killer. If "float" is a domain, then it can be quoted, otherwise it can't. The documentation appears to contradict this. http://www.postgresql.org/docs/current/static/sql-keywords-appendix.html says FLOAT is not reserved and cannot be a function or type. I tried defining a "float" function and it was accepted: pyrseas_testdb=# CREATE FUNCTION "float"(integer) returns real language sql as $_$select $1::real$_$; CREATE FUNCTION It seems to me that since a TYPE in a column definition or function argument can be a non-native TYPE, it could be a reserved word and therefore it should always be allowable to quote the TYPE. Can someone please explain why that is not the case? Joe
On Tue, Nov 29, 2011 at 7:29 PM, Joe Abbate <jma@freedomcircle.com> wrote: > It seems to me that since a TYPE in a column definition or function > argument can be a non-native TYPE, it could be a reserved word and > therefore it should always be allowable to quote the TYPE. Can someone > please explain why that is not the case? Type names as they appear in pg_type.typname can always be quoted. But some types, like int4, have alternate names - e.g. int4 can be specified as integer or int, and foat8 can be specified using the two-word phrase double precision. These alternate names are keywords when unquoted, but identifiers (with a different meaning) when quoted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/29/2011 10:09 PM, Robert Haas wrote: > On Tue, Nov 29, 2011 at 7:29 PM, Joe Abbate <jma@freedomcircle.com> wrote: >> It seems to me that since a TYPE in a column definition or function >> argument can be a non-native TYPE, it could be a reserved word and >> therefore it should always be allowable to quote the TYPE. Can someone >> please explain why that is not the case? > > Type names as they appear in pg_type.typname can always be quoted. > But some types, like int4, have alternate names - e.g. int4 can be > specified as integer or int, and foat8 can be specified using the > two-word phrase double precision. These alternate names are keywords > when unquoted, but identifiers (with a different meaning) when quoted. What I don't understand is the following: pyrseas_testdb=# create table "integer" (c1 integer); drop table "integer"; CREATE TABLE DROP TABLE pyrseas_testdb=# create table "integer" ("integer" integer); drop table "integer"; CREATE TABLE DROP TABLE pyrseas_testdb=# create table "integer" ("integer" "integer"); ERROR: type "integer" does not exist LINE 1: create table "integer" ("integer" "integer"); ^ pyrseas_testdb=# create table "integer" ("integer" "INTEGER"); ERROR: type "INTEGER" does not exist LINE 1: create table "integer" ("integer" "INTEGER"); ^ Why does it allow quoting of "integer" as the table name and the column name, but not as the type name? Furthermore, pyrseas_testdb=# create domain "integer" as "int4"; CREATE DOMAIN pyrseas_testdb=# create table t1 ("integer" "integer"); CREATE TABLE pyrseas_testdb=# create domain "INTEGER" as int4; CREATE DOMAIN pyrseas_testdb=# create table t2 ("integer" "INTEGER"); CREATE TABLE Now that I created a DOMAIN/TYPE named "integer" or "INTEGER" (contradicting the SQL Key Words table), it does allow quoting. This behavior appears inconsistent with the general practice that allows quoting of type names. In other words, why am I allowed to quote "int4", but not "integer" or "INTEGER" (as a type name? Joe
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Nov 29, 2011 at 7:29 PM, Joe Abbate <jma@freedomcircle.com> wrote: >> It seems to me that since a TYPE in a column definition or function >> argument can be a non-native TYPE, it could be a reserved word and >> therefore it should always be allowable to quote the TYPE. Can someone >> please explain why that is not the case? > Type names as they appear in pg_type.typname can always be quoted. > But some types, like int4, have alternate names - e.g. int4 can be > specified as integer or int, and foat8 can be specified using the > two-word phrase double precision. These alternate names are keywords > when unquoted, but identifiers (with a different meaning) when quoted. Another way to say that is that the type int4 can be specified in two ways: int4 (an identifier)INTEGER (a keyword) Quoting "int4" is no problem, because it's still an identifier, but quoting "integer" takes away its keyword nature, so it doesn't get recognized. The fact that INTEGER is a keyword, and not an identifier, is per SQL specification. We could perhaps hack things so that "integer" as an identifier would also work, but I doubt that we'd go so far as to make "INTEGER" (a different identifier) also work, and we'd certainly not be able to do much about the spec's more exciting deviations from identifier-looking type names, such as CHARACTER VARYING or DOUBLE PRECISION. Or to put it more pithily: the SQL committee's ideas of good syntax seem to have been frozen around the time COBOL was invented. regards, tom lane
On Tue, Nov 29, 2011 at 11:21 PM, Joe Abbate <jma@freedomcircle.com> wrote: > Why does it allow quoting of "integer" as the table name and the column > name, but not as the type name? Furthermore, Because there's nothing called "integer" in the pg_type catalog. It's not really a type name; as Tom says, it's some random key word invented by the SQL committee that we map to a type name inside the parser somewhere. On the other hand, the table and column names have the opposite problem: you CAN'T use random keywords there; you can ONLY use identifiers. So in one case you must quote because you need to have an identifier rather than a keyword, whereas in the other case you must not quote because you need to have a keyword rather than an identifier. > pyrseas_testdb=# create domain "integer" as "int4"; > CREATE DOMAIN > pyrseas_testdb=# create table t1 ("integer" "integer"); > CREATE TABLE > pyrseas_testdb=# create domain "INTEGER" as int4; > CREATE DOMAIN > pyrseas_testdb=# create table t2 ("integer" "INTEGER"); > CREATE TABLE > > Now that I created a DOMAIN/TYPE named "integer" or "INTEGER" > (contradicting the SQL Key Words table), it does allow quoting. What else would you expect? It would be extremely strange if you created a type called "integer" and could not then define a column of type "integer". The key point is that what "integer" means has nothing to do with what the unquoted identifier INTEGER means; you could make "integer" be a domain over text if you wanted. You're not making it allow quoting; you're defining a completely new data type that happens to have a name very similar to an existing keyword (but when quoted, it's not a keyword, of course). On a similar note, it's not contradicting the SQL key words table, because once you quote it, *it's no longer a keyword*. The whole point of quoting identifiers is that it allows you to use names that would ordinarily be keywords as non-keywords. > This > behavior appears inconsistent with the general practice that allows > quoting of type names. In other words, why am I allowed to quote > "int4", but not "integer" or "INTEGER" (as a type name? There is no such general practice. In general, if something is an identifier, you can quote it. If it's a keyword and you want to make it an identifier, you can quote it to force that interpretation. Table and column names must be identifiers, so they can always be quoted; if the identifiers happen to be confusable with reserved keywords then you *must* quote them to avoid having them interpreted as keywords rather than identifiers. However, types can be referred to using either a keyword, or an identifier. Those that are identifiers can be quoted; those that are keywords cannot. I will cheerfully admit that this is confusing and inconvenient, and I didn't understand it either until I implemented pg_dump --quote-all-identifiers. However, I'm not sure there's any easy way to improve the situation. For example, what are we to do with character varying? That can't very well be treated as an identifier, because it's got a space in the middle. And even if we could hack around that, it's no good to treat it as "character varying" anyway, because then you'd have "character varying" != "varchar", and we certainly don't want those to be different types. What the current code does is map character varying to varchar under the hood, and then sometimes map it back on output. This does break things for client code that just wants to quote everything in the world (and you're not the first person to run up against this problem; I seem to recall noticing some sketchy-looking code in pgAgent or pgAdmin the last time I looked...) but don't see any realistic alternative that's less evil so I think we're stuck with it... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/29/2011 11:41 PM, Tom Lane wrote: > Another way to say that is that the type int4 can be specified in two > ways: > > int4 (an identifier) > INTEGER (a keyword) > > Quoting "int4" is no problem, because it's still an identifier, but > quoting "integer" takes away its keyword nature, so it doesn't get > recognized. > > The fact that INTEGER is a keyword, and not an identifier, is per SQL > specification. We could perhaps hack things so that "integer" as an > identifier would also work, but I doubt that we'd go so far as to make > "INTEGER" (a different identifier) also work, and we'd certainly not > be able to do much about the spec's more exciting deviations from > identifier-looking type names, such as CHARACTER VARYING or DOUBLE > PRECISION. > > Or to put it more pithily: the SQL committee's ideas of good syntax > seem to have been frozen around the time COBOL was invented. Thanks Tom and Robert. I think I understand the problem now. I guess I'll have to work around this "quirk" by dealing specially with type names and not quote them when they're in the shorter list of SQL Standard reserved words. Joe
Robert Haas <robertmhaas@gmail.com> writes: > I will cheerfully admit that this is confusing and inconvenient, and I > didn't understand it either until I implemented pg_dump > --quote-all-identifiers. However, I'm not sure there's any easy way > to improve the situation. ... especially without breaking compatibility with the SQL standard, which explicitly requires that INTEGER be a keyword. We've been around on this before, many years ago. By my reading of the spec, accepting "integer" (with the quotes) as a name for the standard integer type would actually be contrary to spec, or at least an extension in exactly the same way that "int4" is an extension. We could doubtless do it, but the project gets less rewarding the more you think about the other keyword type names required by spec. regards, tom lane
Excerpts from Joe Abbate's message of mié nov 30 02:15:09 -0300 2011: > Thanks Tom and Robert. I think I understand the problem now. I guess > I'll have to work around this "quirk" by dealing specially with type > names and not quote them when they're in the shorter list of SQL > Standard reserved words. I wonder if it would simpler to just not quote type names except when absolutely necessary. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 11/30/2011 09:02 AM, Alvaro Herrera wrote: > Excerpts from Joe Abbate's message of mié nov 30 02:15:09 -0300 2011: > >> Thanks Tom and Robert. I think I understand the problem now. I guess >> I'll have to work around this "quirk" by dealing specially with type >> names and not quote them when they're in the shorter list of SQL >> Standard reserved words. > I wonder if it would simpler to just not quote type names except when > absolutely necessary. > Yeah, and very much less ugly. Ploughing through masses of unnecessary quotes is they way to a headache. quote_ident() gets this right. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 11/30/2011 09:02 AM, Alvaro Herrera wrote: >> I wonder if it would simpler to just not quote type names except when >> absolutely necessary. > Yeah, and very much less ugly. Ploughing through masses of unnecessary > quotes is they way to a headache. quote_ident() gets this right. But, per this discussion, you can't just blindly apply quote_ident to a type name, because it may not *be* an identifier. One possible solution, if you're getting type information about columns from the server, is to cast the type OID to regtype and let the regtype output converter make all the decisions. It's less notation than a join to pg_type anyway. regards, tom lane
On 11/30/2011 09:55 AM, Tom Lane wrote: > One possible solution, if you're getting type information about columns > from the server, is to cast the type OID to regtype and let the regtype > output converter make all the decisions. It's less notation than a join > to pg_type anyway. Unfortunately, Pyrseas' yamltodb gets (some) type information from a YAML input file, so we can't do that. However, since user defined TYPEs are processed before TABLE definitions, we could search for TYPEs in the parallel catalogs (Python dictionaries) maintained in memory. So, given this: schema public: description: standard public schema table myuser: columns: - info: type: user type user: attributes: - name: text - pass: text we could generate the following SQL against an empty database: CREATE TYPE "user" AS (name text, pass text); CREATE TABLE myuser ( info "user"); Joe
Joe Abbate <jma@freedomcircle.com> wrote: > On 11/30/2011 09:55 AM, Tom Lane wrote: >> One possible solution, if you're getting type information about >> columns from the server, is to cast the type OID to regtype and >> let the regtype output converter make all the decisions. It's >> less notation than a join to pg_type anyway. > > Unfortunately, Pyrseas' yamltodb gets (some) type information from > a YAML input file, so we can't do that. However, since user > defined TYPEs are processed before TABLE definitions, we could > search for TYPEs in the parallel catalogs (Python dictionaries) > maintained in memory. So, given this: > > schema public: > description: standard public schema > table myuser: > columns: > - info: > type: user > type user: > attributes: > - name: text > - pass: text > > we could generate the following SQL against an empty database: > > CREATE TYPE "user" AS (name text, > pass text); > CREATE TABLE myuser ( > info "user"); You are prepared to handle the difference between char and "char", I hope. -Kevin
On 11/30/2011 11:26 AM, Kevin Grittner wrote: > You are prepared to handle the difference between char and "char", I > hope. We have not implemented a type "verifier" in Pyrseas. It currently generates SQL based on the type given in the input. In normal usage, dbtoyaml is expected to be invoked first, and it will generate quoted types if necessary, e.g., schema public: description: standard public schema table myuser: columns: - info: type: '"user"' - active: type: '"char"' - logons: type: integer type user: attributes: - name: text - pass: text The quotes above are because it selects format_type(atttypid, atttypmod) from pg_attribute. The YAML output can then be fed into yamltodb and will generate (assuming the "user" type and the first column of myuser already exist): ALTER TABLE myuser ADD COLUMN active "char"; ALTER TABLE myuser ADD COLUMN logons integer; In other words, Pyrseas depends on the ultimate type verifier: the PostgreSQL parser (and related routines). Joe
Joe Abbate <jma@freedomcircle.com> wrote: > On 11/30/2011 11:26 AM, Kevin Grittner wrote: >> You are prepared to handle the difference between char and >> "char", I hope. > Pyrseas depends on the ultimate type verifier: the > PostgreSQL parser (and related routines). OK. I just wanted to be sure that you were aware of that one; it surprises people sometimes that PostgreSQL includes both a char reserved word for a type and a "char" type which is completely different: test=# create table x (noq char, withq "char"); CREATE TABLE test=# \x on Expanded display is on. test=# select attnum, attname, atttypid, atttypid::regtype, (select typname from pg_type where oid = atttypid), attlen, atttypmod, attbyval, attstorage, attalign from pg_attribute where attrelid = 'x'::regclass and attnum > 0; -[ RECORD 1 ]--------- attnum | 1 attname | noq atttypid | 1042 atttypid | character typname | bpchar attlen | -1 atttypmod | 5 attbyval | f attstorage | x attalign | i -[ RECORD 2 ]--------- attnum | 2 attname | withq atttypid | 18 atttypid | "char" typname | char attlen | 1 atttypmod | -1 attbyval | t attstorage | p attalign | c -Kevin