Обсуждение: Bad Estimate for multi tenant database queries

Поиск
Список
Период
Сортировка

Bad Estimate for multi tenant database queries

От
Peter Grman
Дата:
Hello,

I've noticed that we our queries have very bad estimates, which leads to the planner using slow nested loops, here is a subset of the query without tenant separation (correct estimates):

explain (ANALYZE, COSTS, BUFFERS, FORMAT text)
    select *
    from "Reservation"."Reservation" r
    join "Rates"."TimeSliceDefinition" t on r."TimeSliceDefinitionId" = t."Id"
    join "Reservation"."NoShowFee" f on r."NoShowFeeId" = f."Id"
    where r."DepartureUtc" > '2018-01-01' and r."ArrivalUtc" < '2018-09-30'

Gather  (cost=14034.74..22788.40 rows=12346 width=793) (actual time=23.815..57.178 rows=12263 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=18997
  ->  Hash Join  (cost=13034.74..20553.80 rows=5144 width=793) (actual time=20.869..49.029 rows=4088 loops=3)
        Hash Cond: (r."TimeSliceDefinitionId" = t."Id")
        Buffers: shared hit=18997
        ->  Parallel Hash Join  (cost=12907.55..20413.09 rows=5144 width=662) (actual time=19.210..45.177 rows=4088 loops=3)
              Hash Cond: (f."Id" = r."NoShowFeeId")
              Buffers: shared hit=18683
              ->  Parallel Seq Scan on "NoShowFee" f  (cost=0.00..7343.25 rows=61825 width=143) (actual time=0.006..15.481 rows=49460 loops=3)
                    Buffers: shared hit=6725
              ->  Parallel Hash  (cost=12843.25..12843.25 rows=5144 width=519) (actual time=19.071..19.072 rows=4088 loops=3)
                    Buckets: 16384  Batches: 1  Memory Usage: 4832kB
                    Buffers: shared hit=11958
                    ->  Parallel Seq Scan on "Reservation" r  (cost=0.00..12843.25 rows=5144 width=519) (actual time=0.971..14.919 rows=4088 loops=3)
                          Filter: (("DepartureUtc" > '2018-01-01 00:00:00'::timestamp without time zone) AND ("ArrivalUtc" < '2018-09-30 00:00:00'::timestamp without time zone))
                          Rows Removed by Filter: 43126
                          Buffers: shared hit=11958
        ->  Hash  (cost=96.53..96.53 rows=2453 width=131) (actual time=1.586..1.586 rows=2453 loops=3)
              Buckets: 4096  Batches: 1  Memory Usage: 457kB
              Buffers: shared hit=216
              ->  Seq Scan on "TimeSliceDefinition" t  (cost=0.00..96.53 rows=2453 width=131) (actual time=0.009..0.697 rows=2453 loops=3)
                    Buffers: shared hit=216
Planning Time: 0.424 ms
Execution Time: 58.146 ms

and with tenant separation (wrong estimates):

explain (ANALYZE, COSTS, BUFFERS, FORMAT text)
    select *
    from "Reservation"."Reservation" r
    join "Rates"."TimeSliceDefinition" t on r."TimeSliceDefinitionId" = t."Id"
    join "Reservation"."NoShowFee" f on r."NoShowFeeId" = f."Id"
    where r."DepartureUtc" > '2018-01-01' and r."ArrivalUtc" < '2019-12-31' and r."AccountCode" = 'Code1' and t."AccountCode" = 'Code1' and f."AccountCode" = 'Code1'

Nested Loop  (cost=419.37..6656.11 rows=3 width=793) (actual time=1.367..95.051 rows=8992 loops=1)
  Buffers: shared hit=41970
  ->  Nested Loop  (cost=418.95..6504.77 rows=49 width=650) (actual time=1.355..49.789 rows=8992 loops=1)
        Buffers: shared hit=5980
        ->  Bitmap Heap Scan on "TimeSliceDefinition" t  (cost=4.39..39.99 rows=14 width=131) (actual time=0.015..0.035 rows=14 loops=1)
              Recheck Cond: ("AccountCode" = 'Code1'::text)
              Heap Blocks: exact=7
              Buffers: shared hit=9
              ->  Bitmap Index Scan on "IX_TimeSliceDefinition_AccountCode_PropertyId_Name"  (cost=0.00..4.39 rows=14 width=0) (actual time=0.010..0.010 rows=14 loops=1)
                    Index Cond: ("AccountCode" = 'Code1'::text)
                    Buffers: shared hit=2
        ->  Bitmap Heap Scan on "Reservation" r  (cost=414.56..461.66 rows=11 width=519) (actual time=1.104..2.987 rows=642 loops=14)
              Recheck Cond: (("TimeSliceDefinitionId" = t."Id") AND ("AccountCode" = 'Code1'::text))
              Filter: (("DepartureUtc" > '2018-01-01 00:00:00'::timestamp without time zone) AND ("ArrivalUtc" < '2019-12-31 00:00:00'::timestamp without time zone))
              Rows Removed by Filter: 14
              Heap Blocks: exact=4776
              Buffers: shared hit=5971
              ->  BitmapAnd  (cost=414.56..414.56 rows=12 width=0) (actual time=1.057..1.057 rows=0 loops=14)
                    Buffers: shared hit=1195
                    ->  Bitmap Index Scan on "IX_Reservation_TimeSliceDefinitionId"  (cost=0.00..13.84 rows=189 width=0) (actual time=0.063..0.063 rows=665 loops=14)
                          Index Cond: ("TimeSliceDefinitionId" = t."Id")
                          Buffers: shared hit=90
                    ->  Bitmap Index Scan on "IX_Reservation_AccountCode_EntityId"  (cost=0.00..398.31 rows=8786 width=0) (actual time=1.056..1.056 rows=9225 loops=13)
                          Index Cond: ("AccountCode" = 'Code1'::text)
                          Buffers: shared hit=1105
  ->  Index Scan using "PK_NoShowFee" on "NoShowFee" f  (cost=0.42..3.09 rows=1 width=143) (actual time=0.003..0.003 rows=1 loops=8992)
        Index Cond: ("Id" = r."NoShowFeeId")
        Filter: ("AccountCode" = 'Code1'::text)
        Buffers: shared hit=35990
Planning Time: 0.766 ms
Execution Time: 95.687 ms

now, these 2 queries are fairly similar in execution time. The problem is, that this is the kinda a base query and our ORM is building a monster query around it which takes 2-3 minutes to execute (for one specific tenant - not for others). I took the advise from https://www.postgresql.org/message-id/CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com and tried the same query with SET enable_nestloop=OFF - it took 200ms when it used HASH JOINs

How can I improve the estimates? Just as a catch all, I tried to set default_statistics_target to 10000, re-analyzed the DB and it didn't help. I tried to to create ndistinct statistics to include the account code in the statistics:

CREATE STATISTICS MT_ReservationDepartureUtc (ndistinct) on "AccountCode", "DepartureUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationArrivalUtc (ndistinct) on "AccountCode", "ArrivalUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationNoShowFeeId (ndistinct) on "AccountCode", "NoShowFeeId" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationTimeSliceDefinitionId (ndistinct) on "AccountCode", "TimeSliceDefinitionId" from "Reservation"."Reservation";

but that didn't help either

So I know the issue - it's using a nested loop instead of a hash join, for that one account which has a ton of data sets - all others have less items, that's why they are fine, although even other accounts would benefit from hash joins - one test had 0,15ms execution time instead of 95,5ms

How can I help postgres to collect the correct data?

Thank you,
Peter

Re: Bad Estimate for multi tenant database queries

От
Michael Lewis
Дата:
CREATE STATISTICS MT_ReservationDepartureUtc (ndistinct) on "AccountCode", "DepartureUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationArrivalUtc (ndistinct) on "AccountCode", "ArrivalUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationNoShowFeeId (ndistinct) on "AccountCode", "NoShowFeeId" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationTimeSliceDefinitionId (ndistinct) on "AccountCode", "TimeSliceDefinitionId" from "Reservation"."Reservation";

but that didn't help either

Did you try with 'dependencies' for the statistics_kind or only ndistinct? What was default_statistics_target set to at the time you created the extended statistics? I am not sure if that value is used, but I would assume so.

Re: Bad Estimate for multi tenant database queries

От
Peter Grman
Дата:
Hello Michael,

I digged a bit deeper and found an even simpler query, which can perfectly embody that problem:

explain (ANALYZE, FORMAT text)
select *
from "Reservation"."Reservation" r
join "Rates"."TimeSliceDefinition" t on r."TimeSliceDefinitionId" = t."Id"
where r."AccountCode" = 'OXHV' and t."AccountCode" = 'OXHV'

Nested Loop  (cost=448.56..6918.67 rows=54 width=657) (actual time=1.207..26.874 rows=9322 loops=1)
  ->  Bitmap Heap Scan on "TimeSliceDefinition" t  (cost=4.39..39.99 rows=14 width=131) (actual time=0.013..0.029 rows=14 loops=1)
        Recheck Cond: ("AccountCode" = 'OXHV'::text)
        Heap Blocks: exact=7
        ->  Bitmap Index Scan on "IX_TimeSliceDefinition_AccountCode_EntityId"  (cost=0.00..4.39 rows=14 width=0) (actual time=0.009..0.009 rows=14 loops=1)
              Index Cond: ("AccountCode" = 'OXHV'::text)
  ->  Bitmap Heap Scan on "Reservation" r  (cost=444.17..491.21 rows=12 width=526) (actual time=1.021..1.755 rows=666 loops=14)
        Recheck Cond: (("TimeSliceDefinitionId" = t."Id") AND ("AccountCode" = 'OXHV'::text))
        Heap Blocks: exact=4858
        ->  BitmapAnd  (cost=444.17..444.17 rows=12 width=0) (actual time=0.980..0.980 rows=0 loops=14)
              ->  Bitmap Index Scan on "IX_Reservation_TimeSliceDefinitionId"  (cost=0.00..13.82 rows=187 width=0) (actual time=0.057..0.057 rows=692 loops=14)
                    Index Cond: ("TimeSliceDefinitionId" = t."Id")
              ->  Bitmap Index Scan on "IX_Reservation_AccountCode_EntityId"  (cost=0.00..427.72 rows=9507 width=0) (actual time=0.980..0.980 rows=9327 loops=13)
                    Index Cond: ("AccountCode" = 'OXHV'::text)
Planning Time: 0.353 ms
Execution Time: 27.311 ms

Above the query with wrong estimates (factor ~200x off) and below the query with correct estimates:

explain (ANALYZE, FORMAT text)
select *
from "Reservation"."Reservation" r
join "Rates"."TimeSliceDefinition" t on r."TimeSliceDefinitionId" = t."Id"
where r."AccountCode" = 'OXHV'

Hash Join  (cost=557.83..12519.98 rows=9507 width=657) (actual time=3.290..15.174 rows=9322 loops=1)
  Hash Cond: (r."TimeSliceDefinitionId" = t."Id")
  ->  Bitmap Heap Scan on "Reservation" r  (cost=430.10..12367.25 rows=9507 width=526) (actual time=1.931..10.821 rows=9322 loops=1)
        Recheck Cond: ("AccountCode" = 'OXHV'::text)
        Heap Blocks: exact=4666
        ->  Bitmap Index Scan on "IX_Reservation_AccountCode_EntityId"  (cost=0.00..427.72 rows=9507 width=0) (actual time=1.398..1.398 rows=9327 loops=1)
              Index Cond: ("AccountCode" = 'OXHV'::text)
  ->  Hash  (cost=96.77..96.77 rows=2477 width=131) (actual time=1.312..1.313 rows=2511 loops=1)
        Buckets: 4096  Batches: 1  Memory Usage: 466kB
        ->  Seq Scan on "TimeSliceDefinition" t  (cost=0.00..96.77 rows=2477 width=131) (actual time=0.004..0.550 rows=2511 loops=1)
Planning Time: 1.394 ms
Execution Time: 15.641 ms

Given that the AccountCode should be actually the same, the in all referenced rows, it's really just to double check, in case we have cross tenant (cross account) references. - The extra "and t."AccountCode" = 'OXHV'" is added by the ORM as a safety net, we'd like to keep it that way, but postgres shouldn't consider it for the row estimates.

I've tried creating the following statistics:

CREATE STATISTICS MT_ReservationBucket on "AccountCode", "DepartureUtc", "ArrivalUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationDepartureUtc on "AccountCode", "DepartureUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationArrivalUtc on "AccountCode", "ArrivalUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationNoShowFeeId on "AccountCode", "NoShowFeeId" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationTimeSliceDefinitionId on "AccountCode", "TimeSliceDefinitionId" from "Reservation"."Reservation";
CREATE STATISTICS MT_TimeSliceDefinition on "AccountCode", "Id" from "Rates"."TimeSliceDefinition";
CREATE STATISTICS MT_NoShowFee on "AccountCode", "Id" from "Reservation"."NoShowFee";

I tried creating indexes on Id+AccountCode and TimeSliceDefinitionId+AccountCode.

I tried setting up a foreign key using both columns instead of just one:

alter table "Reservation"."Reservation"
    add constraint "FK_Reservation_TimeSliceDefinition_TimeSliceDefinitionId_test"
        foreign key ("AccountCode", "TimeSliceDefinitionId") references "Rates"."TimeSliceDefinition" ("AccountCode", "Id")
            on delete restrict;

I also tried switching default_statistics_target to 10000 and running full "analyze" again afterwards, with those statistics and other indexes, but nothing had any effect. (Maybe the row estimate grew to 55 instead of 54.)


If I understood that discussion correctly (and that's a big IF), that feature in question might be able to solve my problem. - Am I correct? Or did I completely missunderstand it? Is there anything I can do until that is released? (We're currently on Postgres 11 on RDS)


Thanks for your time!
Peter


On Tue, Sep 3, 2019 at 8:40 PM Michael Lewis <mlewis@entrata.com> wrote:
CREATE STATISTICS MT_ReservationDepartureUtc (ndistinct) on "AccountCode", "DepartureUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationArrivalUtc (ndistinct) on "AccountCode", "ArrivalUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationNoShowFeeId (ndistinct) on "AccountCode", "NoShowFeeId" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationTimeSliceDefinitionId (ndistinct) on "AccountCode", "TimeSliceDefinitionId" from "Reservation"."Reservation";

but that didn't help either

Did you try with 'dependencies' for the statistics_kind or only ndistinct? What was default_statistics_target set to at the time you created the extended statistics? I am not sure if that value is used, but I would assume so.