Обсуждение: BUG #18152: Join condition is not pushed down to union all subquery

Поиск
Список
Период
Сортировка

BUG #18152: Join condition is not pushed down to union all subquery

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18152
Logged by:          Lauri Kajan
Email address:      lauri.kajan@gmail.com
PostgreSQL version: 16.0
Operating system:   "PostgreSQL 16.0 (Debian 16.0-1.pgdg110+1) on x86_
Description:

The join condition is not pushed down to a subquery containing UNION ALL
when even a single query within the subquery contains a WHERE clause. This
issue prevents the optimizer from creating a plan that utilizes the index on
the join column. However, when the WHERE clause is moved to the top-level
query, the optimizer produces the desired plan.

EXPLAIN
WITH
  targets AS (
    SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52
    UNION ALL
    SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars
  )
SELECT
  dealers.name dealer,
  targets.vehicle,
  targets.id
FROM
  dealers
  JOIN targets
    ON dealers.name = targets.dealer_name
WHERE
  dealers.id IN (54,12,456)
;
---
Hash Join  (cost=16.91..4587.77 rows=325 width=41) (actual
time=0.270..27.133 rows=275 loops=1)
  Hash Cond: (""*SELECT* 1"".dealer_name = dealers.name)
  ->  Append  (cost=0.00..4026.20 rows=108280 width=41) (actual
time=0.009..20.376 rows=108222 loops=1)
        ->  Subquery Scan on ""*SELECT* 1""  (cost=0.00..1908.80 rows=8280
width=41) (actual time=0.009..6.708 rows=8222 loops=1)
              ->  Seq Scan on bikes  (cost=0.00..1826.00 rows=8280 width=45)
(actual time=0.009..6.114 rows=8222 loops=1)
                    Filter: (frame_size = 52)
                    Rows Removed by Filter: 91778
        ->  Seq Scan on cars  (cost=0.00..1576.00 rows=100000 width=41)
(actual time=0.012..8.433 rows=100000 loops=1)
  ->  Hash  (cost=16.88..16.88 rows=3 width=5) (actual time=0.017..0.018
rows=3 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Index Scan using dealers_pkey on dealers  (cost=0.28..16.88
rows=3 width=5) (actual time=0.009..0.015 rows=3 loops=1)
              Index Cond: (id = ANY ('{54,12,456}'::integer[]))
Planning Time: 0.148 ms
Execution Time: 27.170 ms



Moving the WHERE clause to the top-level query produces a more efficient
query execution plan. Example:

EXPLAIN ANALYZE
WITH
  targets AS (
    SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes
    UNION ALL
    SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars
  )
SELECT
  dealers.name dealer,
  targets.vehicle,
  targets.id
FROM
  dealers
  JOIN targets
    ON dealers.name = targets.dealer_name
WHERE
  dealers.id IN (54,12,456)
  AND (frame_size IS null OR frame_size = 52)
;
----
Nested Loop  (cost=5.32..1443.67 rows=325 width=41) (actual
time=0.106..0.815 rows=275 loops=1)
  ->  Index Scan using dealers_pkey on dealers  (cost=0.28..16.88 rows=3
width=5) (actual time=0.019..0.031 rows=3 loops=1)
        Index Cond: (id = ANY ('{54,12,456}'::integer[]))
  ->  Append  (cost=5.04..474.52 rows=108 width=41) (actual
time=0.048..0.250 rows=92 loops=3)
        ->  Bitmap Heap Scan on bikes  (cost=5.04..237.10 rows=8 width=41)
(actual time=0.045..0.123 rows=9 loops=3)
              Recheck Cond: (dealer_name = dealers.name)
              Filter: ((frame_size IS NULL) OR (frame_size = 52))
              Rows Removed by Filter: 91
              Heap Blocks: exact=274
              ->  Bitmap Index Scan on bikes_dealer_name_idx
(cost=0.00..5.04 rows=100 width=0) (actual time=0.010..0.010 rows=100
loops=3)
                    Index Cond: (dealer_name = dealers.name)
        ->  Bitmap Heap Scan on cars  (cost=5.07..236.87 rows=100 width=41)
(actual time=0.027..0.116 rows=82 loops=3)
              Recheck Cond: (dealer_name = dealers.name)
              Heap Blocks: exact=233
              ->  Bitmap Index Scan on cars_dealer_name_idx
(cost=0.00..5.04 rows=100 width=0) (actual time=0.017..0.017 rows=82
loops=3)
                    Index Cond: (dealer_name = dealers.name)
Planning Time: 0.238 ms
Execution Time: 0.863 ms

Unfortunately this workaround may become complex if there are multiple
filters for multiple tables, as each filtered column must be added to every
unioned query.


Here's a script to generate sample data for the example case:

CREATE TABLE dealers AS
SELECT
  id,
  (SELECT string_agg(CHR(65+(random() * 25)::integer), '') FROM
generate_series(1, 4) WHERE id>0) name
FROM generate_series(1, 1000) AS id
;
ALTER TABLE dealers ADD primary key (id);
CREATE INDEX ON dealers(name);

CREATE TABLE bikes AS
SELECT
  generate_series AS id,
  (SELECT name FROM dealers WHERE dealers.id = (SELECT (random()*1000)::int
WHERE generate_series>0)) AS dealer_name,
  (random()*12+50)::int as frame_size
FROM generate_series(1, 100000);
ALTER TABLE bikes ADD primary key (id);
CREATE INDEX ON bikes(dealer_name);

CREATE TABLE cars AS
SELECT
  generate_series as id,
  (SELECT name FROM dealers WHERE dealers.id = (SELECT (random()*1000)::int
WHERE generate_series>0)) AS dealer_name,
  (random()*7+14)::int as wheel_size
FROM generate_series(1, 100000);
ALTER TABLE cars ADD primary key (id);
CREATE INDEX ON cars(dealer_name);


Re: BUG #18152: Join condition is not pushed down to union all subquery

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> The join condition is not pushed down to a subquery containing UNION ALL
> when even a single query within the subquery contains a WHERE clause.

I think this is less about "can't push down" than "can't pull up",
specifically that prepjointree.c fails to flatten that subquery
into an "appendrel member", per the comments in is_safe_append_member:

     * It's only safe to pull up the child if its jointree contains exactly
     * one RTE, else the AppendRelInfo data structure breaks. The one base RTE
     * could be buried in several levels of FromExpr, however.  Also, if the
     * child's jointree is completely empty, we can pull up because
     * pull_up_simple_subquery will insert a single RTE_RESULT RTE instead.
     *
     * Also, the child can't have any WHERE quals because there's no place to
     * put them in an appendrel.  (This is a bit annoying...)

I don't recall at the moment if there are fundamental reasons not to
have per-child quals in appendrels, or if it could be done with the
application of enough elbow grease.  But it's probably not trivial.
That comment has been there quite awhile.

            regards, tom lane



Re: BUG #18152: Join condition is not pushed down to union all subquery

От
Richard Guo
Дата:

On Wed, Oct 11, 2023 at 12:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think this is less about "can't push down" than "can't pull up",
specifically that prepjointree.c fails to flatten that subquery
into an "appendrel member", per the comments in is_safe_append_member:

Yeah, I agree with you that this is more about "can't pull up".  The
first leaf of the UNION ALL appendrel is a subquery that has WHERE quals
so it fails to be pulled up, as explained in the comments you pointed
out.
 
I don't recall at the moment if there are fundamental reasons not to
have per-child quals in appendrels, or if it could be done with the
application of enough elbow grease.  But it's probably not trivial.
That comment has been there quite awhile.

I'm wondering if we can keep the per-child quals in AppendRelInfos, and
then apply these quals when we create RelOptInfos for the children of an
appendrel, specifically in expand_appendrel_subquery().

I have a go at this and it can fix the origin problem.

EXPLAIN (COSTS OFF)
WITH
  targets AS (
    SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52
    UNION ALL
    SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars
  )
SELECT
  dealers.name dealer,
  targets.vehicle,
  targets.id
FROM
  dealers
  JOIN targets
    ON dealers.name = targets.dealer_name
WHERE
  dealers.id IN (54,12,456);
                          QUERY PLAN
--------------------------------------------------------------
 Nested Loop
   ->  Index Scan using dealers_pkey on dealers
         Index Cond: (id = ANY ('{54,12,456}'::integer[]))
   ->  Append
         ->  Bitmap Heap Scan on bikes
               Recheck Cond: (dealer_name = dealers.name)
               Filter: (frame_size = 52)
               ->  Bitmap Index Scan on bikes_dealer_name_idx
                     Index Cond: (dealer_name = dealers.name)
         ->  Bitmap Heap Scan on cars
               Recheck Cond: (dealer_name = dealers.name)
               ->  Bitmap Index Scan on cars_dealer_name_idx
                     Index Cond: (dealer_name = dealers.name)
(13 rows)

However, when applying the per-child quals in expand_appendrel_subquery,
I cannot find a way to make these quals go through the EC machinery.
And that would cause us to miss some optimal paths, such as

EXPLAIN (COSTS OFF)
SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52 and frame_size = id
    UNION ALL
SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars;
                        QUERY PLAN
-----------------------------------------------------------
 Append
   ->  Seq Scan on bikes
         Filter: ((frame_size = 52) AND (frame_size = id))
   ->  Seq Scan on cars
(4 rows)

We should have been able to get 'bikes.id = 52' from '(frame_size = 52)
AND (frame_size = id)', and then use index scan on 'bikes', but ...

Maybe we can achieve that with more efforts, but I'm not sure if this is
worthwhile.  Any thoughts?

Thanks
Richard