Обсуждение: BUG #16558: `AND FALSE` increases planning time of query on 2 tables with 1000 partitions to more than 7 seconds
BUG #16558: `AND FALSE` increases planning time of query on 2 tables with 1000 partitions to more than 7 seconds
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 16558 Logged by: Marcin Barczyński Email address: mba.ogolny@gmail.com PostgreSQL version: 12.3 Operating system: Ubuntu 18.04.4 LTS Description: PostgreSQL server version: 12.3 Consider the following setup of empty tables partitioned first by `key1` and then by `key2`: DROP TABLE IF EXISTS demo1 CASCADE; DROP TABLE IF EXISTS demo2 CASCADE; CREATE TABLE demo1(key1 BIGINT, key2 BIGINT) PARTITION BY RANGE(key1); CREATE TABLE demo1_positive PARTITION OF demo1 FOR VALUES FROM (0) TO (MAXVALUE) PARTITION BY LIST (key2); CREATE TABLE demo1_negative PARTITION OF demo1 FOR VALUES FROM (MINVALUE) TO (0) PARTITION BY LIST (key2); CREATE TABLE demo2(key1 BIGINT, key2 BIGINT) PARTITION BY RANGE(key1); CREATE TABLE demo2_positive PARTITION OF demo2 FOR VALUES FROM (0) TO (MAXVALUE) PARTITION BY LIST (key2); CREATE TABLE demo2_negative PARTITION OF demo2 FOR VALUES FROM (MINVALUE) TO (0) PARTITION BY LIST (key2); ALTER TABLE demo1_positive ADD CONSTRAINT demo1_positive_pk PRIMARY KEY (key1, key2); ALTER TABLE demo1_negative ADD CONSTRAINT demo1_negative_pk PRIMARY KEY (key1, key2); ALTER TABLE demo2_positive ADD CONSTRAINT demo2_positive_pk PRIMARY KEY (key1, key2); ALTER TABLE demo2_negative ADD CONSTRAINT demo2_negative_pk PRIMARY KEY (key1, key2); DO $$ DECLARE i BIGINT; BEGIN FOR i IN SELECT * FROM generate_series(0, 1024) LOOP EXECUTE 'CREATE TABLE demo1_positive_' || i || ' PARTITION OF demo1_positive FOR VALUES IN (' || i || ');'; EXECUTE 'CREATE TABLE demo1_negative_' || i || ' PARTITION OF demo1_negative FOR VALUES IN (' || i || ');'; EXECUTE 'CREATE TABLE demo2_positive_' || i || ' PARTITION OF demo2_positive FOR VALUES IN (' || i || ');'; EXECUTE 'CREATE TABLE demo2_negative_' || i || ' PARTITION OF demo2_negative FOR VALUES IN (' || i || ');'; END LOOP; END$$; ANALYZE demo1; ANALYZE demo2; Now, let's investigate the planning time of a query limited to a single partition on both tables: EXPLAIN ANALYZE SELECT * FROM demo1 JOIN demo2 ON demo1.key2 = demo2.key2 WHERE demo1.key2 = 123 AND demo2.key2 = 123 AND FALSE; QUERY PLAN ------------------------------------------------------------------------------------- Result (cost=0.00..0.00 rows=0 width=32) (actual time=0.002..0.002 rows=0 loops=1) One-Time Filter: false Planning Time: 7113.014 ms Execution Time: 0.211 ms (4 rows) Planning time depends quadratically on the number of partitions: - 1 partition: 0.686 ms - 4 partitions: 0.689 ms - 16 partitions: 1.574 ms - 64 partitions: 15.325 ms - 256 partitions: 213.275 ms - 512 partitions: 1043.161 ms - 1024 partitions: 7113.014 ms Experimentally, I observed that removing `AND FALSE` condition vastly increases the planning time: EXPLAIN ANALYZE SELECT * FROM demo1 JOIN demo2 ON demo1.key2 = demo2.key2 WHERE demo1.key2 = 123 AND demo2.key2 = 123; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=13.61..72.91 rows=324 width=32) (actual time=0.011..0.011 rows=0 loops=1) (...) Planning Time: 0.659 ms Execution Time: 0.120 ms (22 rows) I expected that `AND FALSE` condition would not increase the planning time.
PG Bug reporting form <noreply@postgresql.org> writes: > Consider the following setup of empty tables partitioned first by `key1` and > then by `key2`: > ... > EXPLAIN ANALYZE > SELECT * > FROM demo1 > JOIN demo2 ON demo1.key2 = demo2.key2 > WHERE demo1.key2 = 123 > AND demo2.key2 = 123 > AND FALSE; What seems to be happening here is that expression simplification reduces the WHERE clause to just constant-false, ie SELECT * FROM demo1 JOIN demo2 ON demo1.key2 = demo2.key2 WHERE FALSE; and then, because there are no constraints that the partitioning logic can use to recognize that it need only consider a few of the partitions, it proceeds to generate a plan joining the entire partition trees. The individual elements of the plan get thrown away later due to the WHERE FALSE, but not before we've expended cycles considering them; so the planning time is not much different from what it'd be if you had no WHERE at all. I'm disinclined to consider this an interesting case, frankly. It's more an example of the documented fact that we're not very good yet with large numbers of partitions. regards, tom lane
On Wed, 29 Jul 2020 at 03:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > PG Bug reporting form <noreply@postgresql.org> writes: > > Consider the following setup of empty tables partitioned first by `key1` and > > then by `key2`: > > ... > > EXPLAIN ANALYZE > > SELECT * > > FROM demo1 > > JOIN demo2 ON demo1.key2 = demo2.key2 > > WHERE demo1.key2 = 123 > > AND demo2.key2 = 123 > > AND FALSE; > > What seems to be happening here is that expression simplification reduces > the WHERE clause to just constant-false, ie > > SELECT * FROM demo1 JOIN demo2 ON demo1.key2 = demo2.key2 WHERE FALSE; I wonder if we could maybe cheaply do something better for this case. There's a paragraph of comment in distribute_qual_to_rels() which reads: * When the clause contains no volatile functions either, it is actually a * pseudoconstant clause that will not change value during any one * execution of the plan, and hence can be used as a one-time qual in a * gating Result plan node. We put such a clause into the regular * RestrictInfo lists for the moment, but eventually createplan.c will * pull it out and make a gating Result node immediately above whatever * plan node the pseudoconstant clause is assigned to. It's usually best * to put a gating node as high in the plan tree as possible. If we are * not below an outer join, we can actually push the pseudoconstant qual * all the way to the top of the tree. If we are below an outer join, we * leave the qual at its original syntactic level (we could push it up to * just below the outer join, but that seems more complex than it's * worth). I guess that's so we generate the gating qual as high as possible in the plan tree, which is good for saving in the executor and saves work in createplan.c when the gating qual is const-false since we won't need to generate any plan below that. However, it's not very good for the planner as we still must create paths etc for the base relations. I wonder if we can do a bit better and make push const-false quals to the base-rels too. The attached is roughly what I mean. It's absent of any comment updates. Just aimed as a conversation aid at the moment. It does have a side-effect of having remove_rel_from_query() transfer the join-level gating qual to the baserel when a join is removed which does double up on the gating qual and turns "One-Time Filter: false" into "One-Time Filter: (false AND false)". We could check for that in remove_rel_from_query(), but it's a bit ugly... or maybe the whole thing is ugly, but I"m happy to hear thoughts on it. FWIW, Marcin's example case with the patch does: QUERY PLAN ------------------------------------------------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) (actual time=0.001..0.002 rows=0 loops=1) One-Time Filter: false Planning Time: 0.077 ms Execution Time: 0.015 ms (4 rows) David
Вложения
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 29 Jul 2020 at 03:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What seems to be happening here is that expression simplification reduces >> the WHERE clause to just constant-false, ie >> SELECT * FROM demo1 JOIN demo2 ON demo1.key2 = demo2.key2 WHERE FALSE; > I wonder if we could maybe cheaply do something better for this case. I'd wondered about that too, but it seems like there's no way that it's not going to penalize every query in order to improve abysmally stupid queries. I will never buy that an ORM should be allowed to issue WHERE FALSE (rather than suppressing issuance of the query altogether) and expect that somebody else will mop up after its astonishing laziness. So I'm not big on adding complexity and cycles to something as critical as distribute_qual_to_rels to deal with it. FWIW, the existing "pseudoconstant" support is called that because in most useful queries, what it's dealing with is not actually a constant in this sense, but something that will require run-time evaluation. So the majority of that logic is dealing with real, useful query cases, and there is little if any that is optimizing WHERE FALSE per se. regards, tom lane
On Fri, 31 Jul 2020 at 12:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > On Wed, 29 Jul 2020 at 03:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> What seems to be happening here is that expression simplification reduces > >> the WHERE clause to just constant-false, ie > >> SELECT * FROM demo1 JOIN demo2 ON demo1.key2 = demo2.key2 WHERE FALSE; > > > I wonder if we could maybe cheaply do something better for this case. > > I'd wondered about that too, but it seems like there's no way that it's > not going to penalize every query in order to improve abysmally stupid > queries. I will never buy that an ORM should be allowed to issue WHERE > FALSE (rather than suppressing issuance of the query altogether) and > expect that somebody else will mop up after its astonishing laziness. > So I'm not big on adding complexity and cycles to something as critical > as distribute_qual_to_rels to deal with it. hmm ok. FWIW, I did have in mind that the FALSE had been constant folded from something more complex. I don't really have any particular interest in speeding up the queries that actually actually write WHERE false. It's just constant folding has already been done by this time so we can't really know if the qual was something more legitimate prior to the folding. David
David Rowley <dgrowleyml@gmail.com> writes: > hmm ok. FWIW, I did have in mind that the FALSE had been constant > folded from something more complex. Hm. Maybe that's a legitimate argument, not sure. But I'd still rather find someplace that's less critical performance-wise if we're going to try to hack up this case. I also wonder about cases like select * from (t1 join t2 on false) join t3 on t1.x=t3.y; If we were taking this seriously, it'd be nice to deduce that (1) the t1/t2 join is empty and (2) therefore so is the join to t3, so (3) we need not build paths for any of these base rels. I think the syntactically-driven method you're proposing would not catch that, which'd be problematic if t3 is the giant partitioned rel. [ wanders away wondering if reduce_outer_joins could be taught to do this... ] regards, tom lane