Обсуждение: BitmapAnd on correlated column?

Поиск
Список
Период
Сортировка

BitmapAnd on correlated column?

От
greigwise
Дата:
I'm running the following query on Postgres version 10.8:

SELECT  count(*) FROM test_table WHERE and id_column_1 IN (9954, 9690, 9689,
9688) AND
id_column_2 IN (75328, 51448, 48060, 48065, 51803, 51449, 51802, 48064,
48061, 48062, 48059, 48063, 121834, 326649, 303167, 303249, 336838, 334172,
303419, 48148, 329727, 320007, 205066, 205065, 75406, 48153, 56298, 51570,
62668, 48150, 110618, 48154, 48147, 110626, 96830, 110624, 110615, 110625,
110621, 110617, 115865, 196128, 179940, 156843, 186994, 125995, 206982,
240753, 245896, 131002, 215463, 303111, 303399, 336923, 325403, 336189,
335278, 335271, 334985, 325014, 59681, 48277, 68237, 48067, 48072, 49379,
60063, 85903, 117843, 51451, 65681, 117848, 117845, 48068, 48066, 48071,
50323, 51450, 48069, 121835, 117844, 48070, 49310, 293247, 134761, 210493,
325404, 117847, 117846, 169399, 187795, 117849, 200056, 309165, 106874,
310255, 117343, 240831, 117842, 270870, 259300, 259291, 307661, 335006,
329259, 119279, 293474, 303516, 119277, 303557, 307101, 302380, 301949,
252399, 51988, 65660, 68236, 51987, 86026, 50557, 82723, 51452, 50550,
50552, 50559, 70009, 50554, 50551, 50555, 50553, 55955, 82766, 119274,
57069, 58184, 111317, 84796, 122502, 119275, 119270, 119273, 89111, 307110,
105797, 105798, 57068, 119280, 122495, 132483, 293249, 140929, 307666,
307660, 210494, 294051, 210507, 211284, 74299, 235551, 119281, 269433,
269434, 269891, 240828, 50556, 307977);

There are 2 indexes, one on id_column_1 and one on id_column_2.

The plan looks like this:

 Aggregate  (cost=732.87..732.88 rows=1 width=8) (actual time=33.835..33.835
rows=1 loops=1)
   ->  Bitmap Heap Scan on test_table  (cost=701.88..732.82 rows=18 width=0)
(actual time=13.073..31.288 rows=24965 loops=1)
         Recheck Cond: ((id_column_1 = ANY
('{9954,9690,9689,9688}'::integer[])) AND (id_column_2 = ANY

('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[])))
         Heap Blocks: exact=20879
         ->  BitmapAnd  (cost=701.88..701.88 rows=18 width=0) (actual
time=9.904..9.904 rows=0 loops=1)
               ->  Bitmap Index Scan on index_on_col1  (cost=0.00..95.31
rows=7676 width=0) (actual time=5.345..5.345 rows=28346 loops=1)
                     Index Cond: (id_column_1 = ANY
('{9954,9690,9689,9688}'::integer[]))
               ->  Bitmap Index Scan on index_on_col2  (cost=0.00..606.32
rows=37941 width=0) (actual time=3.395..3.395 rows=24965 loops=1)
                     Index Cond: (id_column_2 = ANY

('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[]))
 Planning time: 1.452 ms
 Execution time: 34.036 ms

The thing is that id_column_1 is really dependent on id_column_2.  So
there's really no point in scanning the index on id_column_1.  In fact, if I
remove that in clause for id_column_1 from the query, I get a better plan:

 Aggregate  (cost=14045.14..14045.15 rows=1 width=8) (actual
time=22.743..22.743 rows=1 loops=1)
   ->  Index Only Scan using index_on_col2 on test_table
(cost=0.43..13950.28 rows=37941 width=0) (actual time=0.130..19.880
rows=24965 loops=1)
         Index Cond: (id_column_2 = ANY

('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[]))
         Heap Fetches: 24965
 Planning time: 0.647 ms
 Execution time: 22.781 ms

I thought maybe extended statistics would help, so I did this:

create statistics test (dependencies) on id_column_2, id_column_1 from
test_table;
analyze test_table;

But the plan was nearly identical to the first plan with the BitmapAND even
after creating the extended statistics:

 Aggregate  (cost=738.12..738.13 rows=1 width=8) (actual time=34.207..34.207
rows=1 loops=1)
   ->  Bitmap Heap Scan on test_table  (cost=707.13..738.07 rows=18 width=0)
(actual time=14.230..31.782 rows=24965 loops=1)
         Recheck Cond: ((id_column_1 = ANY
('{9954,9690,9689,9688}'::integer[])) AND (id_column_2 = ANY

('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[])))
         Heap Blocks: exact=20879
         ->  BitmapAnd  (cost=707.13..707.13 rows=18 width=0) (actual
time=10.710..10.710 rows=0 loops=1)
               ->  Bitmap Index Scan on index_on_col1  (cost=0.00..95.58
rows=7711 width=0) (actual time=5.232..5.232 rows=28346 loops=1)
                     Index Cond: (id_column_1 = ANY
('{9954,9690,9689,9688}'::integer[]))
               ->  Bitmap Index Scan on index_on_col2  (cost=0.00..611.30
rows=38750 width=0) (actual time=4.010..4.010 rows=24965 loops=1)
                     Index Cond: (id_column_2 = ANY

('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[]))
 Planning time: 1.073 ms
 Execution time: 34.331 ms

So, I'm just wondering if there's anything I can do to influence the
optimize to pick the better plan using just the one index on id_column_2
(aside from re-writing the query).

Thanks,
Greig Wise




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: BitmapAnd on correlated column?

От
Rob Sargent
Дата:
On 10/3/19 3:22 PM, greigwise wrote:
> I'm running the following query on Postgres version 10.8:
>
> SELECT  count(*) FROM test_table WHERE and id_column_1 IN (9954, 9690, 9689,
> 9688) AND
> id_column_2 IN (75328, 51448, 48060, 48065, 51803, 51449, 51802, 48064,
>
> -
> Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
>
>


Is "where and" just a typo?




Re: BitmapAnd on correlated column?

От
greigwise
Дата:

Re: BitmapAnd on correlated column?

От
Laurenz Albe
Дата:
On Thu, 2019-10-03 at 14:22 -0700, greigwise wrote:
> I'm running the following query on Postgres version 10.8:
> 
> SELECT  count(*) FROM test_table WHERE and id_column_1 IN (9954, 
> 9690, 9689, 9688) AND id_column_2 IN ([long list]);
> 
> There are 2 indexes, one on id_column_1 and one on id_column_2.  
> 
> The plan looks like this:
> 
>  Aggregate
>    ->  Bitmap Heap Scan on test_table
>          ->  BitmapAnd
>                ->  Bitmap Index Scan on index_on_col1
>                ->  Bitmap Index Scan on index_on_col2
>  Planning time: 1.452 ms
>  Execution time: 34.036 ms
> 
> The thing is that id_column_1 is really dependent on id_column_2.  So
> there's really no point in scanning the index on id_column_1.  In 
> fact, if I remove that in clause for id_column_1 from the query, I 
> get a better plan:
> 
>  Aggregate
>    ->  Index Only Scan using index_on_col2 on test_table 
>  Planning time: 0.647 ms
>  Execution time: 22.781 ms
> 
> I thought maybe extended statistics would help, so I did this:
> 
> create statistics test (dependencies) on id_column_2, id_column_1
> from test_table;
> analyze test_table;
> 
> But the plan was nearly identical to the first plan with the
> BitmapAND even after creating the extended statistics:
> 
> So, I'm just wondering if there's anything I can do to influence the
> optimize to pick the better plan using just the one index on 
> id_column_2 (aside from re-writing the query).  

Extended statistics will tell PostgreSQL that it is very unlikely
that the first condition will contribute significantly, but that
is no proof that the condition can be omitted, so the optimizer
cannot just skip the condition.

You'll have to rewrite the query.

If one condition depends on the other, consider normalizing the table.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: BitmapAnd on correlated column?

От
greigwise
Дата:
>Extended statistics will tell PostgreSQL that it is very unlikely
>that the first condition will contribute significantly, but that
>is no proof that the condition can be omitted, so the optimizer
>cannot just skip the condition.

Granted it cannot skip the condition, but that doesn't mean that it has to
use that second index.  It's doing a  recheck on the conditions anyway,
right?

Thanks.
Greig



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: BitmapAnd on correlated column?

От
Tom Lane
Дата:
greigwise <greigwise@comcast.net> writes:
> Granted it cannot skip the condition, but that doesn't mean that it has to
> use that second index.  It's doing a  recheck on the conditions anyway,
> right?

I'm afraid you're out of luck on that for now.  In principle, the
presence of extended stats on the column combination might allow
the planner to figure out that the second IN condition adds little
selectivity.  But I'm not sure how hard it would be, and I am sure
that the extended-stats logic hasn't been built out in that direction
yet.

            regards, tom lane