Обсуждение: BUG #18152: Join condition is not pushed down to union all subquery
The following bug has been logged on the website: Bug reference: 18152 Logged by: Lauri Kajan Email address: lauri.kajan@gmail.com PostgreSQL version: 16.0 Operating system: "PostgreSQL 16.0 (Debian 16.0-1.pgdg110+1) on x86_ Description: The join condition is not pushed down to a subquery containing UNION ALL when even a single query within the subquery contains a WHERE clause. This issue prevents the optimizer from creating a plan that utilizes the index on the join column. However, when the WHERE clause is moved to the top-level query, the optimizer produces the desired plan. EXPLAIN WITH targets AS ( SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE frame_size = 52 UNION ALL SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM cars ) SELECT dealers.name dealer, targets.vehicle, targets.id FROM dealers JOIN targets ON dealers.name = targets.dealer_name WHERE dealers.id IN (54,12,456) ; --- Hash Join (cost=16.91..4587.77 rows=325 width=41) (actual time=0.270..27.133 rows=275 loops=1) Hash Cond: (""*SELECT* 1"".dealer_name = dealers.name) -> Append (cost=0.00..4026.20 rows=108280 width=41) (actual time=0.009..20.376 rows=108222 loops=1) -> Subquery Scan on ""*SELECT* 1"" (cost=0.00..1908.80 rows=8280 width=41) (actual time=0.009..6.708 rows=8222 loops=1) -> Seq Scan on bikes (cost=0.00..1826.00 rows=8280 width=45) (actual time=0.009..6.114 rows=8222 loops=1) Filter: (frame_size = 52) Rows Removed by Filter: 91778 -> Seq Scan on cars (cost=0.00..1576.00 rows=100000 width=41) (actual time=0.012..8.433 rows=100000 loops=1) -> Hash (cost=16.88..16.88 rows=3 width=5) (actual time=0.017..0.018 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Index Scan using dealers_pkey on dealers (cost=0.28..16.88 rows=3 width=5) (actual time=0.009..0.015 rows=3 loops=1) Index Cond: (id = ANY ('{54,12,456}'::integer[])) Planning Time: 0.148 ms Execution Time: 27.170 ms Moving the WHERE clause to the top-level query produces a more efficient query execution plan. Example: EXPLAIN ANALYZE WITH targets AS ( SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes UNION ALL SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM cars ) SELECT dealers.name dealer, targets.vehicle, targets.id FROM dealers JOIN targets ON dealers.name = targets.dealer_name WHERE dealers.id IN (54,12,456) AND (frame_size IS null OR frame_size = 52) ; ---- Nested Loop (cost=5.32..1443.67 rows=325 width=41) (actual time=0.106..0.815 rows=275 loops=1) -> Index Scan using dealers_pkey on dealers (cost=0.28..16.88 rows=3 width=5) (actual time=0.019..0.031 rows=3 loops=1) Index Cond: (id = ANY ('{54,12,456}'::integer[])) -> Append (cost=5.04..474.52 rows=108 width=41) (actual time=0.048..0.250 rows=92 loops=3) -> Bitmap Heap Scan on bikes (cost=5.04..237.10 rows=8 width=41) (actual time=0.045..0.123 rows=9 loops=3) Recheck Cond: (dealer_name = dealers.name) Filter: ((frame_size IS NULL) OR (frame_size = 52)) Rows Removed by Filter: 91 Heap Blocks: exact=274 -> Bitmap Index Scan on bikes_dealer_name_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.010..0.010 rows=100 loops=3) Index Cond: (dealer_name = dealers.name) -> Bitmap Heap Scan on cars (cost=5.07..236.87 rows=100 width=41) (actual time=0.027..0.116 rows=82 loops=3) Recheck Cond: (dealer_name = dealers.name) Heap Blocks: exact=233 -> Bitmap Index Scan on cars_dealer_name_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.017..0.017 rows=82 loops=3) Index Cond: (dealer_name = dealers.name) Planning Time: 0.238 ms Execution Time: 0.863 ms Unfortunately this workaround may become complex if there are multiple filters for multiple tables, as each filtered column must be added to every unioned query. Here's a script to generate sample data for the example case: CREATE TABLE dealers AS SELECT id, (SELECT string_agg(CHR(65+(random() * 25)::integer), '') FROM generate_series(1, 4) WHERE id>0) name FROM generate_series(1, 1000) AS id ; ALTER TABLE dealers ADD primary key (id); CREATE INDEX ON dealers(name); CREATE TABLE bikes AS SELECT generate_series AS id, (SELECT name FROM dealers WHERE dealers.id = (SELECT (random()*1000)::int WHERE generate_series>0)) AS dealer_name, (random()*12+50)::int as frame_size FROM generate_series(1, 100000); ALTER TABLE bikes ADD primary key (id); CREATE INDEX ON bikes(dealer_name); CREATE TABLE cars AS SELECT generate_series as id, (SELECT name FROM dealers WHERE dealers.id = (SELECT (random()*1000)::int WHERE generate_series>0)) AS dealer_name, (random()*7+14)::int as wheel_size FROM generate_series(1, 100000); ALTER TABLE cars ADD primary key (id); CREATE INDEX ON cars(dealer_name);
PG Bug reporting form <noreply@postgresql.org> writes: > The join condition is not pushed down to a subquery containing UNION ALL > when even a single query within the subquery contains a WHERE clause. I think this is less about "can't push down" than "can't pull up", specifically that prepjointree.c fails to flatten that subquery into an "appendrel member", per the comments in is_safe_append_member: * It's only safe to pull up the child if its jointree contains exactly * one RTE, else the AppendRelInfo data structure breaks. The one base RTE * could be buried in several levels of FromExpr, however. Also, if the * child's jointree is completely empty, we can pull up because * pull_up_simple_subquery will insert a single RTE_RESULT RTE instead. * * Also, the child can't have any WHERE quals because there's no place to * put them in an appendrel. (This is a bit annoying...) I don't recall at the moment if there are fundamental reasons not to have per-child quals in appendrels, or if it could be done with the application of enough elbow grease. But it's probably not trivial. That comment has been there quite awhile. regards, tom lane
On Wed, Oct 11, 2023 at 12:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think this is less about "can't push down" than "can't pull up",
specifically that prepjointree.c fails to flatten that subquery
into an "appendrel member", per the comments in is_safe_append_member:
Yeah, I agree with you that this is more about "can't pull up". The
first leaf of the UNION ALL appendrel is a subquery that has WHERE quals
so it fails to be pulled up, as explained in the comments you pointed
out.
first leaf of the UNION ALL appendrel is a subquery that has WHERE quals
so it fails to be pulled up, as explained in the comments you pointed
out.
I don't recall at the moment if there are fundamental reasons not to
have per-child quals in appendrels, or if it could be done with the
application of enough elbow grease. But it's probably not trivial.
That comment has been there quite awhile.
I'm wondering if we can keep the per-child quals in AppendRelInfos, and
then apply these quals when we create RelOptInfos for the children of an
appendrel, specifically in expand_appendrel_subquery().
I have a go at this and it can fix the origin problem.
EXPLAIN (COSTS OFF)
WITH
targets AS (
SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52
UNION ALL
SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars
)
SELECT
dealers.name dealer,
targets.vehicle,
targets.id
FROM
dealers
JOIN targets
ON dealers.name = targets.dealer_name
WHERE
dealers.id IN (54,12,456);
QUERY PLAN
--------------------------------------------------------------
Nested Loop
-> Index Scan using dealers_pkey on dealers
Index Cond: (id = ANY ('{54,12,456}'::integer[]))
-> Append
-> Bitmap Heap Scan on bikes
Recheck Cond: (dealer_name = dealers.name)
Filter: (frame_size = 52)
-> Bitmap Index Scan on bikes_dealer_name_idx
Index Cond: (dealer_name = dealers.name)
-> Bitmap Heap Scan on cars
Recheck Cond: (dealer_name = dealers.name)
-> Bitmap Index Scan on cars_dealer_name_idx
Index Cond: (dealer_name = dealers.name)
(13 rows)
However, when applying the per-child quals in expand_appendrel_subquery,
I cannot find a way to make these quals go through the EC machinery.
And that would cause us to miss some optimal paths, such as
EXPLAIN (COSTS OFF)
SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52 and frame_size = id
UNION ALL
SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars;
QUERY PLAN
-----------------------------------------------------------
Append
-> Seq Scan on bikes
Filter: ((frame_size = 52) AND (frame_size = id))
-> Seq Scan on cars
(4 rows)
We should have been able to get 'bikes.id = 52' from '(frame_size = 52)
AND (frame_size = id)', and then use index scan on 'bikes', but ...
Maybe we can achieve that with more efforts, but I'm not sure if this is
worthwhile. Any thoughts?
Thanks
Richard
then apply these quals when we create RelOptInfos for the children of an
appendrel, specifically in expand_appendrel_subquery().
I have a go at this and it can fix the origin problem.
EXPLAIN (COSTS OFF)
WITH
targets AS (
SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52
UNION ALL
SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars
)
SELECT
dealers.name dealer,
targets.vehicle,
targets.id
FROM
dealers
JOIN targets
ON dealers.name = targets.dealer_name
WHERE
dealers.id IN (54,12,456);
QUERY PLAN
--------------------------------------------------------------
Nested Loop
-> Index Scan using dealers_pkey on dealers
Index Cond: (id = ANY ('{54,12,456}'::integer[]))
-> Append
-> Bitmap Heap Scan on bikes
Recheck Cond: (dealer_name = dealers.name)
Filter: (frame_size = 52)
-> Bitmap Index Scan on bikes_dealer_name_idx
Index Cond: (dealer_name = dealers.name)
-> Bitmap Heap Scan on cars
Recheck Cond: (dealer_name = dealers.name)
-> Bitmap Index Scan on cars_dealer_name_idx
Index Cond: (dealer_name = dealers.name)
(13 rows)
However, when applying the per-child quals in expand_appendrel_subquery,
I cannot find a way to make these quals go through the EC machinery.
And that would cause us to miss some optimal paths, such as
EXPLAIN (COSTS OFF)
SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52 and frame_size = id
UNION ALL
SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars;
QUERY PLAN
-----------------------------------------------------------
Append
-> Seq Scan on bikes
Filter: ((frame_size = 52) AND (frame_size = id))
-> Seq Scan on cars
(4 rows)
We should have been able to get 'bikes.id = 52' from '(frame_size = 52)
AND (frame_size = id)', and then use index scan on 'bikes', but ...
Maybe we can achieve that with more efforts, but I'm not sure if this is
worthwhile. Any thoughts?
Thanks
Richard