Обсуждение: Using GROUPING SETS with more than one set disables predicate pushdown?
Hi! First of all, thanks for the great work! PostgreSQL is amazing, and community is super helpful. I found an unexpected behaviour in PostgreSQL, and was advised to post it to the performance mailing list on IRC. Using GROUPING SETS with more than one set disables predicate pushdown? Version: PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit Seems like when GROUPING SETS with at least two sets are used in the subquery, planner can not push WHERE clauses inside. Here are two queries that (I think) are equivalent, but produce very different execution plans leading to bad performance on real data - and in effect, making it impossible to abstract away non-trivial grouping logic into a view. It might as well be that queries are not really equivalent, but I don't see how. Same problem happens even if grouping sets are the same - like `GROUPING SETS ((), ())`. CREATE TEMPORARY TABLE test_gs ( x INT, y INT, z INT, PRIMARY KEY (x, y, z) ); EXPLAIN SELECT x, y, avg(z) AS mean FROM test_gs WHERE x = 1 GROUP BY x, GROUPING SETS ((y), ()); QUERY PLAN -----------------------------------------------------------------------------------------GroupAggregate (cost=0.15..8.65rows=20 width=40) Group Key: x, y Group Key: x -> Index Only Scan using test_gs_pkey on test_gs (cost=0.15..8.33 rows=10 width=12) Index Cond: (x = 1) (5 rows) EXPLAIN SELECT x, y, mean FROM ( SELECT x, y, avg(z) AS mean FROM test_gs GROUP BY x, GROUPINGSETS ((y), ()) ) AS g WHERE x = 1; QUERY PLAN --------------------------------------------------------------------------------------------GroupAggregate (cost=0.15..62.10rows=404 width=40) Group Key: test_gs.x, test_gs.y Group Key: test_gs.x Filter: (test_gs.x = 1) -> Index Only Scan using test_gs_pkey on test_gs (cost=0.15..41.75 rows=2040 width=12) (5 rows) The issue here is that the second query is not using index to filter on x = 1 , instead it reads all the tuples from an index and applies the filter. Here is also a description in gist: https://gist.github.com/zeveshe/cf92c9d2a6b14518af3180113e767ae7 Thanks a lot! -- Zakhar Shapurau zak@zvs.no +47 407 54 397
Zakhar Shapurau <zak@zvs.no> writes: > Using GROUPING SETS with more than one set disables predicate pushdown? It looks like this is a case that no one's gotten round to yet. The comment in the relevant code is * In some cases we may want to transfer a HAVING clause into WHERE. We * cannot do so if the HAVING clause containsaggregates (obviously) or * volatile functions (since a HAVING clause is supposed to be executed * only onceper group). We also can't do this if there are any nonempty * grouping sets; moving such a clause into WHERE wouldpotentially change * the results, if any referenced column isn't present in all the grouping * sets. (If thereare only empty grouping sets, then the HAVING clause * must be degenerate as discussed below.) Presumably, we could examine the grouping sets to identify column(s) present in all sets, and then allow the optimization for clauses that reference only such columns. Or maybe I'm misreading the comment (but then it needs clarification). regards, tom lane
On November 21, 2017 6:49:26 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Zakhar Shapurau <zak@zvs.no> writes: > >Presumably, we could examine the grouping sets to identify column(s) >present in all sets, and then allow the optimization for clauses that >reference only such columns. Or maybe I'm misreading the comment >(but then it needs clarification). By memory that sounds about right. IIRC we'd some slightly more elaborate logic when GS were introduced, but had to takeit out as buggy, and it was too late in the development cycle to come up with something better. Andres Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On November 21, 2017 6:49:26 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Zakhar Shapurau <zak@zvs.no> writes: > >Presumably, we could examine the grouping sets to identify column(s) >present in all sets, and then allow the optimization for clauses that >reference only such columns. Or maybe I'm misreading the comment >(but then it needs clarification). By memory that sounds about right. IIRC we'd some slightly more elaborate logic when GS were introduced, but had to takeit out as buggy, and it was too late in the development cycle to come up with something better. Andres Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.