Обсуждение: Conditional JOINs ?
Hello, I'm currently camping with a bit of a problem -- i have a common requests table, and based on an entry's type, I need to join a specific other table. Consider this database layout: ########################### -- table where all common request data is stored CREATE TABLE log.requests ( id BIGSERIAL, type SMALLINT NOT NULL, timestamp INTEGER NOT NULL, -- all kinds of common data ) -- only rows where log.requests.type = 1 CREATE TABLE log.requests1 ( request_id BIGINT NOT NULL ) -- only rows where log.requests.type = 2 CREATE TABLE log.requests2 ( request_id BIGINT NOT NULL ) etc (about 10 different types) the current VIEW: CREATE VIEW requests_view AS SELECT * FROM log.requests LEFT JOIN log.requests1 ON (log.requests.id = log.requests1.request_id) LEFT JOIN log.requests2 ON (log.requests.id = log.requests2.request_id) -- etc ########################### As you can see, if the requests table gets a bit large (50mil+ rows) and when there are many different types, there will be a lot of wasted JOINs. Now, based on a previous post on the PostgreSQL mailing list [http://archives.postgresql.org/pgsql-general/2007-11/msg00723.php] I came up with this solution: ########################### -- table where all common request data is stored CREATE TABLE log.requests ( id BIGSERIAL, type SMALLINT NOT NULL, ref1 BIGINT CHECK (type = 1) = (ref1 IS NOT NULL)), ref2 BIGINT CHECK (type = 2) = (ref2 IS NOT NULL)), timestamp INTEGER NOT NULL, -- all kinds of common data ) -- only rows where log.requests.type = 1 CREATE TABLE log.requests1 ( request_id BIGINT NOT NULL ) -- only rows where log.requests.type = 2 CREATE TABLE log.requestsz2 ( request_id BIGINT NOT NULL ) etc (about 10 different types) the current VIEW: CREATE VIEW requests_view AS SELECT * FROM log.requests LEFT JOIN log.requests1 ON (log.requests.ref1 = log.requests1.request_id) LEFT JOIN log.requests2 ON (log.requests.ref2 = log.requests2.request_id) -- etc ########################### Now, in my theory, you would say that if postgresql encounters ref1 = NULL, it will not attempt to JOIN the log.requests1 table. However, I've been told that because the PostgreSQL planner doesn't know that ref1 (or any other refX for that matter) is NULL, it will attempt to JOIN all tables for all rows. Is this true, and if so.. is there a workaround for this (perhaps that my database design is flawed) ? Regards, Leon Mergen
On Mar 18, 2008, at 8:06 PM, Leon Mergen wrote: > Hello, > > Now, based on a previous post on the PostgreSQL mailing list > [http://archives.postgresql.org/pgsql-general/2007-11/msg00723.php] I > came up with this solution: > > ########################### > -- table where all common request data is stored > CREATE TABLE log.requests ( > id BIGSERIAL, > type SMALLINT NOT NULL, > ref1 BIGINT CHECK (type = 1) = (ref1 IS NOT NULL)), > ref2 BIGINT CHECK (type = 2) = (ref2 IS NOT NULL)), > timestamp INTEGER NOT NULL, > -- all kinds of common data > ) > > -- only rows where log.requests.type = 1 > CREATE TABLE log.requests1 ( > request_id BIGINT NOT NULL > ) > > -- only rows where log.requests.type = 2 > CREATE TABLE log.requestsz2 ( > request_id BIGINT NOT NULL > ) > > etc (about 10 different types) > > the current VIEW: > CREATE VIEW requests_view AS > SELECT * FROM log.requests LEFT JOIN log.requests1 ON > (log.requests.ref1 = log.requests1.request_id) LEFT JOIN log.requests2 > ON (log.requests.ref2 = log.requests2.request_id) -- etc > ########################### > > Now, in my theory, you would say that if postgresql encounters ref1 = > NULL, it will not attempt to JOIN the log.requests1 table. However, > I've been told that because the PostgreSQL planner doesn't know that > ref1 (or any other refX for that matter) is NULL, it will attempt to > JOIN all tables for all rows. > > Is this true, and if so.. is there a workaround for this (perhaps that > my database design is flawed) ? This looks almost like table partitioning. If you inherit your requestxxx tables from a common requests table and add a check constraint to each inheriting table (a "partition"), the planner is smart enough to figure out that no rows in that partition can possibly match (constraint exclusion) and skips it. Instead of joining, it uses something equivalent to a UNION ALL btw, which I think is what you're looking for anyway. There have been plenty of posts about how to set up table partitioning, there's even an article about it in the standard Postgres documentation: http://www.postgresql.org/docs/8.1/static/ddl-partitioning.html Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,47e019899786732118417!
Hello Alban, On 3/18/08, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote: > > Now, in my theory, you would say that if postgresql encounters ref1 = > > NULL, it will not attempt to JOIN the log.requests1 table. However, > > I've been told that because the PostgreSQL planner doesn't know that > > ref1 (or any other refX for that matter) is NULL, it will attempt to > > JOIN all tables for all rows. > > > > Is this true, and if so.. is there a workaround for this (perhaps that > > my database design is flawed) ? > > > This looks almost like table partitioning. If you inherit your > requestxxx tables from a common requests table and add a check > constraint to each inheriting table (a "partition"), the planner is > smart enough to figure out that no rows in that partition can > possibly match (constraint exclusion) and skips it. > > Instead of joining, it uses something equivalent to a UNION ALL btw, > which I think is what you're looking for anyway. Well, the thing (as far as I'm aware) is that table partinioning and UNION ALL expect the table layouts to look the same, don't they ? The problem I'm having is that each row in a table has some 'additional' information, which is in another table, and can be retrieved based on a specific column in the table (request_type). Now, I fail to see how UNION ALL or table partitioning can solve this problem, which can be my problem -- am I missing some technique how table partitioning can be used to extend a base table with several extra tables that provide extra information ? -- Leon Mergen http://www.solatis.com
On Mar 18, 2008, at 3:06 PM, Leon Mergen wrote: > Hello Alban, > > On 3/18/08, Alban Hertroys <dalroi@solfertje.student.utwente.nl> > wrote: >>> Now, in my theory, you would say that if postgresql encounters >>> ref1 = >>> NULL, it will not attempt to JOIN the log.requests1 table. However, >>> I've been told that because the PostgreSQL planner doesn't know that >>> ref1 (or any other refX for that matter) is NULL, it will attempt to >>> JOIN all tables for all rows. >>> >>> Is this true, and if so.. is there a workaround for this (perhaps >>> that >>> my database design is flawed) ? >> >> >> This looks almost like table partitioning. If you inherit your >> requestxxx tables from a common requests table and add a check >> constraint to each inheriting table (a "partition"), the planner is >> smart enough to figure out that no rows in that partition can >> possibly match (constraint exclusion) and skips it. >> >> Instead of joining, it uses something equivalent to a UNION ALL btw, >> which I think is what you're looking for anyway. > > Well, the thing (as far as I'm aware) is that table partinioning and > UNION ALL expect the table layouts to look the same, don't they ? The > problem I'm having is that each row in a table has some 'additional' > information, which is in another table, and can be retrieved based on > a specific column in the table (request_type). > > Now, I fail to see how UNION ALL or table partitioning can solve this > problem, which can be my problem -- am I missing some technique how > table partitioning can be used to extend a base table with several > extra tables that provide extra information ? Table partitioning is normally implemented via table inheritance and you are free to add more, and different, columns to the "child" tables. Observe: CREATE SEQUENCE part_seq; CREATE TABLE parent ( id integer PRIMARY KEY DEFAULT nextval('part_seq'), foo text ); CREATE TABLE child1 ( bar text, CHECK(foo='some_type1'), PRIMARY KEY (id) ) INHERITS (parent); CREATE TABLE child2 ( baz text, CHECK(foo='some_type2'), PRIMARY KEY (id) ) INHERITS (parent); Now, both child1 and child2 have id and foo fields, child1 will only allow entries with foo='some_type1', child2 will only allow entries with foo='some_type2', and both children have extra fields that weren't present in the parent. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Hello Erik, On 3/18/08, Erik Jones <erik@myemma.com> wrote: > Table partitioning is normally implemented via table inheritance and > you are free to add more, and different, columns to the "child" tables. > > Observe: > > CREATE SEQUENCE part_seq; > CREATE TABLE parent ( > id integer PRIMARY KEY DEFAULT nextval('part_seq'), > foo text > ); > > CREATE TABLE child1 ( > bar text, > CHECK(foo='some_type1'), > PRIMARY KEY (id) > ) INHERITS (parent); > > CREATE TABLE child2 ( > baz text, > CHECK(foo='some_type2'), > PRIMARY KEY (id) > ) INHERITS (parent); > > Now, both child1 and child2 have id and foo fields, child1 will only > allow entries with foo='some_type1', child2 will only allow entries > with foo='some_type2', and both children have extra fields that > weren't present in the parent. Ah, silly that I failed to understand that. Thanks a lot for your response (Alban too) -- I can see table partitioning solving my problem. -- Leon Mergen http://www.solatis.com
On Mar 18, 2008, at 3:50 PM, Leon Mergen wrote: > Ah, silly that I failed to understand that. Nah, we all do that stuff. > Thanks a lot for your response (Alban too) -- I can see table > partitioning solving my problem. You're welcome! Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Hello, On 3/18/08, Erik Jones <erik@myemma.com> wrote: > Observe: > > CREATE SEQUENCE part_seq; > CREATE TABLE parent ( > id integer PRIMARY KEY DEFAULT nextval('part_seq'), > foo text > ); > > CREATE TABLE child1 ( > bar text, > CHECK(foo='some_type1'), > PRIMARY KEY (id) > ) INHERITS (parent); > > CREATE TABLE child2 ( > baz text, > CHECK(foo='some_type2'), > PRIMARY KEY (id) > ) INHERITS (parent); > > Now, both child1 and child2 have id and foo fields, child1 will only > allow entries with foo='some_type1', child2 will only allow entries > with foo='some_type2', and both children have extra fields that > weren't present in the parent. Excuse me for bumping this up again, but I still don't understand how to use this approach to sequentially walk through all different child tables in one select, without having to JOIN these tables all the time -- or will the planner 'understand' a query such as this: SELECT parent.*, child1.*, child2.* FROM parent LEFT JOIN child1 ON (parent.id = child1.id) LEFT JOIN child2 ON (parent.id = child2.id); When running explain on this, as I interpret it, it shows that the query plan will join both child1 and child2 on all the rows inside the parent table: QUERY PLAN ----------------------------------------------------------------------------------- Hash Left Join (cost=56.00..189.50 rows=2760 width=172) Hash Cond: (public.parent.id = child1.id) -> Hash Left Join (cost=28.00..123.55 rows=2760 width=104) Hash Cond: (public.parent.id = child2.id) -> Append (cost=0.00..57.60 rows=2760 width=36) -> Seq Scan on parent (cost=0.00..21.60 rows=1160 width=36) -> Seq Scan on child1 parent (cost=0.00..18.00 rows=800 width=36) -> Seq Scan on child2 parent (cost=0.00..18.00 rows=800 width=36) -> Hash (cost=18.00..18.00 rows=800 width=68) -> Seq Scan on child2 (cost=0.00..18.00 rows=800 width=68) -> Hash (cost=18.00..18.00 rows=800 width=68) -> Seq Scan on child1 (cost=0.00..18.00 rows=800 width=68) Now, of course there must be something I'm missing here.. but this seems like the solution of table inheritance will only result in the same problem I was having before -- either I need to JOIN every row on all child tables, or I need to specifically iterate over all the child tables, one child table at a time (which will probably result in even worse performance, since the 'parent' table is huge). Am I misunderstanding something here, or is there simple no solution for what I want ? Regards, Leon Mergen
On 3/19/08, Leon Mergen <leon@solatis.com> wrote: > Excuse me for bumping this up again, but I still don't understand how > to use this approach to sequentially walk through all different child > tables in one select, without having to JOIN these tables all the time Apparently a UNION all solved this problem -- sorry for the noise. -- Leon Mergen http://www.solatis.com
On Mar 19, 2008, at 9:01 AM, Leon Mergen wrote: > On 3/19/08, Leon Mergen <leon@solatis.com> wrote: >> Excuse me for bumping this up again, but I still don't understand how >> to use this approach to sequentially walk through all different child >> tables in one select, without having to JOIN these tables all the >> time > > Apparently a UNION all solved this problem -- sorry for the noise. If you have the child tables INHERITing from the parent, then a simple SELECT parent.* FROM parent; would be equivalent to manually spelling out a UNION ALL that explicitly lists all of the tables. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On 3/19/08, Erik Jones <erik@myemma.com> wrote: > >> Excuse me for bumping this up again, but I still don't understand how > >> to use this approach to sequentially walk through all different child > >> tables in one select, without having to JOIN these tables all the > >> time > > > > Apparently a UNION all solved this problem -- sorry for the noise. > > > If you have the child tables INHERITing from the parent, then a simple > > SELECT parent.* FROM parent; > > would be equivalent to manually spelling out a UNION ALL that > explicitly lists all of the tables. But this will only display the information that is common for all the child tables -- if I also want to display all the information that is specific for the child tables, as I understand it, I have to use a UNION ALL and merge all the child tables together this way. The EXPLAIN of this query: Append (cost=0.00..2169.52 rows=34376 width=94) -> Seq Scan ON child1 (cost=0.00..1824.71 rows=34371 width=94) -> Seq Scan ON child2 (cost=0.00..1.05 rows=5 width=56) Regards, Leon Mergen
Leon Mergen wrote: > On 3/19/08, Erik Jones <erik@myemma.com> wrote: > >> >> Excuse me for bumping this up again, but I still don't understand how >> >> to use this approach to sequentially walk through all different child >> >> tables in one select, without having to JOIN these tables all the >> >> time >> > >> > Apparently a UNION all solved this problem -- sorry for the noise. >> >> >> If you have the child tables INHERITing from the parent, then a simple >> >> SELECT parent.* FROM parent; >> >> would be equivalent to manually spelling out a UNION ALL that >> explicitly lists all of the tables. >> > > But this will only display the information that is common for all the > child tables -- if I also want to display all the information that is > specific for the child tables, as I understand it, I have to use a > UNION ALL and merge all the child tables together this way. > > The EXPLAIN of this query: > > Append (cost=0.00..2169.52 rows=34376 width=94) > -> Seq Scan ON child1 (cost=0.00..1824.71 rows=34371 width=94) > -> Seq Scan ON child2 (cost=0.00..1.05 rows=5 width=56) > > Regards, > > Leon Mergen > > What I think you desire is more in the form of: SELECT id, foo, bar, NULL AS "baz" FROM child1 UNION ALL SELECT id, foo, NULL, baz FROM child2. I think if you compare it to I/O volume, the joins will not cause many additional I/Os as long as the indexes on "id" for tables child1 and child2 will fit into memory. - Joris