Обсуждение: Casting issues with domains
Hello all, We experienced some casting issues with domains. We experienced the problem while querying the information_schema btw, but here is a simpler test case : postgres=# create table test1 (a text); CREATE TABLE postgres=# insert into test1 select generate_series(1,100000); INSERT 0 100000 postgres=# create index idx1 on test1(a); CREATE INDEX postgres=# analyze test1 ; ANALYZE; postgres=# explain select * from test1 where a = 'toto'; QUERY PLAN -----------------------------------------------------------------------Index Only Scan using idx1 on test1 (cost=0.29..8.31rows=1 width=5) Index Cond: (a = 'toto'::text) (2 lignes) Now we create a tstdom domain and cast the a column to tstdom in the view definition : postgres=# create domain tstdom as text; CREATE DOMAIN postgres=# create view test2 as select a::tstdom from test1 ; CREATE VIEW postgres=# explain select * from test2 where a='toto'; QUERY PLAN ----------------------------------------------------------Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5) Filter:(((a)::tstdom)::text = 'toto'::text) (2 lignes) As you can see, a is casted to tstdom then again to text. This casts prevents the optimizer to choose an index scan to retrieve the data. The casts are however strictly equivalent and should be not prevent the optimizer to use indexes. Also, the same problem appears in the information_schema views, as every object names are casted to information_schema.sql_identifier. Even if this domain is declared as name, no index will be used because of this cast. Shouldn't the planner simplify the casts when it's possible ? Regards, Thomas
Thomas Reiss <thomas.reiss@dalibo.com> writes: > postgres=# explain select * from test2 where a='toto'; > QUERY PLAN > ---------------------------------------------------------- > Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5) > Filter: (((a)::tstdom)::text = 'toto'::text) > (2 lignes) > As you can see, a is casted to tstdom then again to text. This casts > prevents the optimizer to choose an index scan to retrieve the data. The > casts are however strictly equivalent and should be not prevent the > optimizer to use indexes. No, they are not equivalent. The optimizer can't simply drop the cast-to-domain, because that cast might result in a runtime error due to a domain CHECK constraint violation. (This is true even if no such constraint exists at planning time, unfortunately. If we had a mechanism to force replanning at ALTER DOMAIN ADD CONSTRAINT, maybe the no-constraints case could be handled better, but we don't; and adding one would also imply adding more locks around domain usage, so it's not all that attractive to do it.) The short answer is that SQL domains are not zero-cost type aliases. Perhaps there would be value in having a feature that *is* a a zero-cost alias, but it wouldn't be a domain. regards, tom lane
On 12/8/14, 9:18 AM, Tom Lane wrote: > The short answer is that SQL domains are not zero-cost type aliases. > Perhaps there would be value in having a feature that*is* a a zero-cost > alias, but it wouldn't be a domain. Note that you can actually re-use the support functions of one data type to create a new one. So if you wanted a specialtype called "document" that actually behaved the same as text you could do that fairly easily (though not as easilyas creating a domain). If we were going to expend energy here, I suspect it would be more useful to look at ways of creating new types without requiringC. C isn't an option on many (even most) environments in today's "cloud" world, aside from the intimidation factor.There are comments in the code that hypothesize about making cstring a full type; that might be all that's needed. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Le 08/12/2014 16:18, Tom Lane a écrit : > Thomas Reiss <thomas.reiss@dalibo.com> writes: >> postgres=# explain select * from test2 where a='toto'; >> QUERY PLAN >> ---------------------------------------------------------- >> Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5) >> Filter: (((a)::tstdom)::text = 'toto'::text) >> (2 lignes) > >> As you can see, a is casted to tstdom then again to text. This casts >> prevents the optimizer to choose an index scan to retrieve the data. The >> casts are however strictly equivalent and should be not prevent the >> optimizer to use indexes. > > No, they are not equivalent. The optimizer can't simply drop the > cast-to-domain, because that cast might result in a runtime error due > to a domain CHECK constraint violation. (This is true even if no such > constraint exists at planning time, unfortunately. If we had a > mechanism to force replanning at ALTER DOMAIN ADD CONSTRAINT, maybe the > no-constraints case could be handled better, but we don't; and adding > one would also imply adding more locks around domain usage, so it's not > all that attractive to do it.) > > The short answer is that SQL domains are not zero-cost type aliases. > Perhaps there would be value in having a feature that *is* a a zero-cost > alias, but it wouldn't be a domain. I agree regarding the feature for zero-cost aliases. It would ease access on the catalog done via the information_schema for example. Thanks for your answer. There's some room for improvement for sure, but it not as easy as it seems. Regards, Thomas
Thomas Reiss <thomas.reiss@dalibo.com> wrote: > postgres=# create table test1 (a text); > CREATE TABLE > postgres=# insert into test1 select generate_series(1,100000); > INSERT 0 100000 > postgres=# create index idx1 on test1(a); > CREATE INDEX > postgres=# analyze test1 ; > ANALYZE; > postgres=# explain select * from test1 where a = 'toto'; > QUERY PLAN > ----------------------------------------------------------------------- > Index Only Scan using idx1 on test1 (cost=0.29..8.31 rows=1 width=5) > Index Cond: (a = 'toto'::text) > (2 lignes) > > Now we create a tstdom domain and cast the a column to tstdom in the > view definition : > postgres=# create domain tstdom as text; > CREATE DOMAIN > postgres=# create view test2 as select a::tstdom from test1 ; > CREATE VIEW > postgres=# explain select * from test2 where a='toto'; > QUERY PLAN > ---------------------------------------------------------- > Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5) > Filter: (((a)::tstdom)::text = 'toto'::text) > (2 lignes) > > As you can see, a is casted to tstdom then again to text. This casts > prevents the optimizer to choose an index scan to retrieve the data. The > casts are however strictly equivalent and should be not prevent the > optimizer to use indexes. You can create an index to be used for searching using the domain. Following the steps in your example, you can run this: postgres=# create index idx2 on test1 ((a::tstdom)); CREATE INDEX postgres=# vacuum analyze test1; VACUUM postgres=# explain select * from test2 where a='toto'; QUERY PLAN ------------------------------------------------------------------Index Scan using idx2 on test1 (cost=0.29..8.31 rows=1width=5) Index Cond: (((a)::tstdom)::text = 'toto'::text) (2 rows) It's even easier if "a" is defined to be a member of the domain in the original table: postgres=# create domain tstdom as text; CREATE DOMAIN postgres=# create table test1 (a tstdom); CREATE TABLE postgres=# insert into test1 select generate_series(1,100000); INSERT 0 100000 postgres=# create index idx1 on test1(a); CREATE INDEX postgres=# analyze test1 ; ANALYZE postgres=# explain select * from test1 where a = 'toto'; QUERY PLAN -----------------------------------------------------------------------Index Only Scan using idx1 on test1 (cost=0.29..8.31rows=1 width=5) Index Cond: (a = 'toto'::text) (2 rows) postgres=# create view test2 as select a::tstdom from test1 ; CREATE VIEW postgres=# explain select * from test2 where a='toto'; QUERY PLAN -----------------------------------------------------------------------Index Only Scan using idx1 on test1 (cost=0.29..8.31rows=1 width=5) Index Cond: (a = 'toto'::text) (2 rows) It's kinda hard for me to visualize where it makes sense to define the original table column as the bare type but use a domain when referencing it in the view. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > It's kinda hard for me to visualize where it makes sense to define > the original table column as the bare type but use a domain when > referencing it in the view. As far as that goes, I think the OP was unhappy about the performance of the information_schema views, which in our implementation do exactly that so that the exposed types of the view columns conform to the SQL standard, even though the underlying catalogs use PG-centric types. I don't believe that that's the only reason why the performance of the information_schema views tends to be sucky, but it's certainly a reason. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> It's kinda hard for me to visualize where it makes sense to define >> the original table column as the bare type but use a domain when >> referencing it in the view. > > As far as that goes, I think the OP was unhappy about the performance > of the information_schema views, which in our implementation do exactly > that so that the exposed types of the view columns conform to the SQL > standard, even though the underlying catalogs use PG-centric types. > > I don't believe that that's the only reason why the performance of the > information_schema views tends to be sucky, but it's certainly a reason. Is that schema too "edge case" to justify some functional indexes on the cast values on the underlying catalogs? (I'm inclined to think so, but it seemed like a question worth putting out there....) Or, since these particular domains are known, is there any sane way to "special-case" these to allow the underlying types to work? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> As far as that goes, I think the OP was unhappy about the performance >> of the information_schema views, which in our implementation do exactly >> that so that the exposed types of the view columns conform to the SQL >> standard, even though the underlying catalogs use PG-centric types. >> >> I don't believe that that's the only reason why the performance of the >> information_schema views tends to be sucky, but it's certainly a reason. > Is that schema too "edge case" to justify some functional indexes > on the cast values on the underlying catalogs? (I'm inclined to > think so, but it seemed like a question worth putting out > there....) We don't support functional indexes on system catalogs, so whether it'd be justified is sorta moot. On the whole though I'm inclined to agree that the information_schema views aren't used enough to justify adding overhead to system-catalog updates, even if the pieces for that all existed. > Or, since these particular domains are known, is there any sane way > to "special-case" these to allow the underlying types to work? I don't particularly care for a kluge solution here. I notice that recent versions of the SQL spec contain the notion of a "distinct type", which is a user-defined type that is representationally identical to some base type but has its own name, and comes equipped with assignment-grade casts to and from the base type (which in PG terms would be binary-compatible casts, though the spec doesn't require that). It seems like this might be intended to be the sort of "zero cost type alias" we were talking about, except that the SQL committee seems to have got it wrong by not specifying the cast-to-base-type as being implicit. Which ISTM you would want so that operators/functions on the base type would apply automatically to the distinct type. But perhaps we could extend the spec with some option to CREATE TYPE to allow the cast to come out that way. Or in short, maybe we should try to replace the domains used in the current information_schema with distinct types. regards, tom lane
Le 11/12/2014 00:46, Tom Lane a écrit : > Kevin Grittner <kgrittn@ymail.com> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> As far as that goes, I think the OP was unhappy about the performance >>> of the information_schema views, which in our implementation do exactly >>> that so that the exposed types of the view columns conform to the SQL >>> standard, even though the underlying catalogs use PG-centric types. >>> >>> I don't believe that that's the only reason why the performance of the >>> information_schema views tends to be sucky, but it's certainly a reason. > >> Is that schema too "edge case" to justify some functional indexes >> on the cast values on the underlying catalogs? (I'm inclined to >> think so, but it seemed like a question worth putting out >> there....) > > We don't support functional indexes on system catalogs, so whether it'd > be justified is sorta moot. On the whole though I'm inclined to agree > that the information_schema views aren't used enough to justify adding > overhead to system-catalog updates, even if the pieces for that all > existed. > >> Or, since these particular domains are known, is there any sane way >> to "special-case" these to allow the underlying types to work? > > I don't particularly care for a kluge solution here. > > I notice that recent versions of the SQL spec contain the notion of a > "distinct type", which is a user-defined type that is representationally > identical to some base type but has its own name, and comes equipped with > assignment-grade casts to and from the base type (which in PG terms would > be binary-compatible casts, though the spec doesn't require that). > It seems like this might be intended to be the sort of "zero cost type > alias" we were talking about, except that the SQL committee seems to have > got it wrong by not specifying the cast-to-base-type as being implicit. > Which ISTM you would want so that operators/functions on the base type > would apply automatically to the distinct type. But perhaps we could > extend the spec with some option to CREATE TYPE to allow the cast to come > out that way. > > Or in short, maybe we should try to replace the domains used in the > current information_schema with distinct types. That's interesting and could easily solve the problem. To give some context, for some reason, Drupal queries the information_schema views before displaying some pages. As our customer has many tables (approx 60000 tables, organised "à la Oracle" with one schema per database user). Thus, the seq scan against pg_class takes ~50ms and the very same one without the cast takes less than 1ms. There is an example of query used : SELECT column_name, data_type, column_default FROM information_schema.columnsWHERE table_schema = 'one_schema' AND table_name= 'one_table' AND ( data_type = 'bytea' OR ( numeric_precision IS NOT NULL AND column_default::text LIKE'%nextval%' )); Regards