Обсуждение: Bitmap scan seem like such a strange choice when "limit 1"

Поиск
Список
Период
Сортировка

Bitmap scan seem like such a strange choice when "limit 1"

От
Klaudie Willis
Дата:
Postgres 13 beta3

set enable_bitmapscan=1; -- default
explain (analyze,buffers) 
select *
from bigtable 
where cars_ref = 1769854207 and t > '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp 
limit 1

Short story.  Big table > 100M rows. b-tree index on cars_ref, the t constraints limits it to one partition, but I don't think that is very relevant.  In any case, running this query takes 1.5s:

Limit  (cost=23728.33..23729.24 rows=1 width=635) (actual time=1516.865..1516.867 rows=1 loops=1)
  Buffers: shared hit=2376
  ->  Bitmap Heap Scan on bigtable_y2020 bigtable (cost=23728.33..2530109.01 rows=2730872 width=635) (actual time=1516.863..1516.864 rows=1 loops=1)
        Recheck Cond: (cars_ref = 1769854207)
        Filter: ((t > '2020-01-01 00:00:00'::timestamp without time zone) AND (t < '2021-01-01 00:00:00'::timestamp without time zone))
        Heap Blocks: exact=1
        Buffers: shared hit=2376
        ->  Bitmap Index Scan on bigtable_y2020_cars_ref_idx  (cost=0.00..23045.61 rows=2731965 width=0) (actual time=751.640..751.640 rows=2817675 loops=1)
              Index Cond: (cars_ref = 1769854207)
              Buffers: shared hit=2375
Planning Time: 0.365 ms
Execution Time: 1540.207 ms

1.5 seconds seems a lot for a single indexed row.  I would think it should be instant, and so it is when I disable bitmap scan with: set enable_bitmapscan=0

Limit  (cost=0.57..1.60 rows=1 width=636) (actual time=0.027..0.028 rows=1 loops=1)
  Buffers: shared hit=5
  ->  Index Scan using bigtable_y2020_cars_ref_idx on bigtable_y2020 bigtable (cost=0.57..2966738.51 rows=2873818 width=636) (actual time=0.026..0.026 rows=1 loops=1)
        Index Cond: (cars_ref = 1769854207)
        Filter: ((t > '2020-01-01 00:00:00'::timestamp without time zone) AND (t < '2021-01-01 00:00:00'::timestamp without time zone))
        Buffers: shared hit=5
Planning Time: 0.291 ms
Execution Time: 0.049 ms

But I am not supposed to disable bitmap scan!  So why on earth do Postgres 13 beta3 think that returning 1 row, should be done with a bitmap scan?
I noticed that different values for "cars_ref" result in different plans in the query above.  I belive that it has to do with wheter or not the cars_ref is in the "most common value" list.  But in any case, I cant see why a bitmap scan is wise then you expect one row.

best regards
Klaudie

Re: Bitmap scan seem like such a strange choice when "limit 1"

От
Laurenz Albe
Дата:
On Fri, 2020-09-04 at 11:42 +0000, Klaudie Willis wrote:
> Postgres 13 beta3
> 
> set enable_bitmapscan=1; -- default
> explain (analyze,buffers) 
> select *
> from bigtable 
> where cars_ref = 1769854207 and t > '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp 
> limit 1
> 
> Short story.  Big table > 100M rows. b-tree index on cars_ref, the t constraints limits it to one partition, but I
don'tthink that is very relevant.  In any case, running this query takes 1.5s:
 
> 
> Limit  (cost=23728.33..23729.24 rows=1 width=635) (actual time=1516.865..1516.867 rows=1 loops=1)
>   Buffers: shared hit=2376
>   ->  Bitmap Heap Scan on bigtable_y2020 bigtable (cost=23728.33..2530109.01 rows=2730872 width=635) (actual
time=1516.863..1516.864rows=1 loops=1)
 
>         Recheck Cond: (cars_ref = 1769854207)
>         Filter: ((t > '2020-01-01 00:00:00'::timestamp without time zone) AND (t < '2021-01-01 00:00:00'::timestamp
withouttime zone))
 
>         Heap Blocks: exact=1
>         Buffers: shared hit=2376
>         ->  Bitmap Index Scan on bigtable_y2020_cars_ref_idx  (cost=0.00..23045.61 rows=2731965 width=0) (actual
time=751.640..751.640rows=2817675 loops=1)
 
>               Index Cond: (cars_ref = 1769854207)
>               Buffers: shared hit=2375
> Planning Time: 0.365 ms
> Execution Time: 1540.207 ms
> 
> 1.5 seconds seems a lot for a single indexed row.  I would think it should be instant, and so it is when I disable
bitmapscan with: set enable_bitmapscan=0
 
> 
> Limit  (cost=0.57..1.60 rows=1 width=636) (actual time=0.027..0.028 rows=1 loops=1)
>   Buffers: shared hit=5
>   ->  Index Scan using bigtable_y2020_cars_ref_idx on bigtable_y2020 bigtable (cost=0.57..2966738.51 rows=2873818
width=636)(actual time=0.026..0.026 rows=1 loops=1)
 
>         Index Cond: (cars_ref = 1769854207)
>         Filter: ((t > '2020-01-01 00:00:00'::timestamp without time zone) AND (t < '2021-01-01 00:00:00'::timestamp
withouttime zone))
 
>         Buffers: shared hit=5
> Planning Time: 0.291 ms
> Execution Time: 0.049 ms
> 
> But I am not supposed to disable bitmap scan!  So why on earth do Postgres 13 beta3 think that returning 1 row,
shouldbe done with a bitmap scan?
 
> I noticed that different values for "cars_ref" result in different plans in the query above.  I belive that it has to
dowith wheter or not the cars_ref is in the "most common value" list.  But in
 
> any case, I cant see why a bitmap scan is wise then you expect one row.

PostgreSQL estimates that 2817675 rows satisfy the index condition and expects
that it will have to scan many of them before it finds one that satisfies the
filter condition.  That turns out to be a wrong guess.

You could create an index on (cars_ref, t), then PostgreSQL will certainly
pick an index scan.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Bitmap scan seem like such a strange choice when "limit 1"

От
Klaudie Willis
Дата:
> PostgreSQL estimates that 2817675 rows satisfy the index condition and expects
> that it will have to scan many of them before it finds one that satisfies the
> filter condition. That turns out to be a wrong guess.
>
> You could create an index on (cars_ref, t), then PostgreSQL will certainly
> pick an index scan.


Thanks!  But, the t (time constraint) already isolates a particular partition.  The bigtable is partitioned on exactly
t,by year.  This is why you do not see any other indexes/partitions being queried in the EXPLAIN. 

...
PARTITION BY RANGE (t)
...
CREATE TABLE public.bigtable_y2020 PARTITION OF public.bigtable
    FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2021-01-01 00:00:00');

To me, it seems like filter on date is unnecessary when you already IS on such a partition!

I'd like to add, that when I do the same query DIRECTLY on the bigtable_y2020 (instead of the partition parent) it does
changeto "index scan" again. 

best regards
K




Re: Bitmap scan seem like such a strange choice when "limit 1"

От
Tom Lane
Дата:
Klaudie Willis <Klaudie.Willis@protonmail.com> writes:
> I'd like to add, that when I do the same query DIRECTLY on the bigtable_y2020 (instead of the partition parent) it
doeschange to "index scan" again. 

Yeah.  I think the issue here is that add_paths_to_append_rel only
considers cheapest-total paths for the member relations.  Seeing
that it's already considering a slightly ridiculous number of
parallelization options, I'm hesitant to throw in cheapest-startup
considerations as well, for fear of blowing out planning time.

Maybe the right way to improve this is to bypass add_paths_to_append_rel
entirely when there's exactly one surviving child rel, and make it
just use all the surviving paths for that child.

            regards, tom lane



Re: Bitmap scan seem like such a strange choice when "limit 1"

От
Michael Lewis
Дата:
Index Scan using bigtable_y2020_cars_ref_idx on bigtable_y2020 bigtable (cost=0.57..2966738.51 rows=2873818 width=636) (actual time=0.026..0.026 rows=1 loops=1)

Given the system expects to get almost 3 million rows when it should be just 1, it seems like a stats problem to me. How is ndistinct on that column compared to reality?

Fw: Re: Bitmap scan seem like such a strange choice when "limit 1"

От
Klaudie Willis
Дата:
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, September 7, 2020 9:04 AM, Klaudie Willis <Klaudie.Willis@protonmail.com> wrote:

 > Maybe the right way to improve this is to bypass add_paths_to_append_rel
 > entirely when there's exactly one surviving child rel, and make it
 > just use all the surviving paths for that child.
 > regards, tom lane

Does this classify as a bug, or is it "by design"? Should I submit one for PG13beta3? To me, this seems like quite a
straightforward case. Big database partitioned, two key lookups, both indexed where one key isolates a partition, and
witha limit constraint. Yet it chooses a plan that is 1000x slower than the optimal plan. 

K





Fw: Re: Bitmap scan seem like such a strange choice when "limit 1"

От
Klaudie Willis
Дата:
> t > '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp 
>Not at all important, but it seems odd to be exclusive of the start and end both. I would >consider including the start with >=
>Michael Lewis  |  Database Engineer
>Entrata


Michael,  funny I was thinking that myself minutes after posting. Perhaps it is that tiny gap that makes a difference; however changing it to t >= '2020....etc' and perfectly matching the partition range, did not change anything of significance in the explain or runtime. :-|

On that other topic, n_distinct, it is for the moment indeed hardcoded to -0,1. I have tried to reset n_distinct, and run analyze with default_target_statistics = 2000; no dice!
However, the cars_ref in question, is present in the most_common_vals of pg_stats, and according to that frequency array, that value occurs with a frequency of 1,7%.  That seems correct.  

select count(*)
from bigtablet
where cars_ref = 1769854207
and  t >= '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp; 
--> 2 817 169

I can add that car_ref in general is quite skewed in its distribution, but I don't think that is the issue here.
I think the key hint is that when targeting the partition child table directly, the plan changes.  See below for "proof"

explain (analyze,buffers) 
select *
from bigtable
where car_ref = 1769854207 and t >= '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp
limit 1


Limit  (cost=24961.76..24962.67 rows=1 width=636) (actual time=1456.315..1456.316 rows=1 loops=1)
  Buffers: shared hit=2377
  ->  Bitmap Heap Scan on bigtable_y2020 bigtable  (cost=24961.76..2640351.94 rows=2874279 width=636) (actual time=1456.313..1456.314 rows=1 loops=1)
        Recheck Cond: (car_ref = 1769854207)
        Filter: ((t >= '2020-01-01 00:00:00'::timestamp without time zone) AND (t < '2021-01-01 00:00:00'::timestamp without time zone))
        Heap Blocks: exact=1
        Buffers: shared hit=2377
        ->  Bitmap Index Scan on bigtable_2020_ref_index  (cost=0.00..24243.19 rows=2874336 width=0) (actual time=721.428..721.428 rows=2817169 loops=1)
              Index Cond: (car_ref = 1769854207)
              Buffers: shared hit=2376
Planning Time: 0.321 ms
Execution Time: 1480.087 ms


explain (analyze,buffers) 
select *
from bigtable_y2020  tt
where car_ref = 1769854207 and t >= '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp 
limit 1

Limit  (cost=0.57..1.60 rows=1 width=636) (actual time=0.037..0.038 rows=1 loops=1)
  Buffers: shared hit=5
  ->  Index Scan using bigtable_2020_ref_index on bigtable_y2020 tt  (cost=0.57..2967225.58 rows=2874279 width=636) (actual time=0.036..0.036 rows=1 loops=1)
        Index Cond: (car_ref = 1769854207)
        Filter: ((t >= '2020-01-01 00:00:00'::timestamp without time zone) AND (t < '2021-01-01 00:00:00'::timestamp without time zone))
        Buffers: shared hit=5
Planning Time: 0.349 ms
Execution Time: 0.106 ms
 
best regards
K