Re: Inconsistent query performance based on relation hit frequency

Поиск
Список
Период
Сортировка
От Andrei Lepikhov
Тема Re: Inconsistent query performance based on relation hit frequency
Дата
Msg-id a67e439f-15f9-4ecc-85f0-30679e3d35bb@gmail.com
обсуждение исходный текст
Ответ на Inconsistent query performance based on relation hit frequency  (Laura Hausmann <laura@hausmann.dev>)
Ответы Re: Inconsistent query performance based on relation hit frequency
Список pgsql-performance
On 6/27/24 07:50, Laura Hausmann wrote:
> I'd appreciate any and all input on the situation. If I've left out any 
> information that would be useful in figuring this out, please tell me.
Thanks for this curious case, I like it!
At first, you can try to avoid "OR" expressions - PostgreSQL has quite 
limited set of optimisation/prediction tricks on such expressions.
Second - I see, postgres predicts wrong number of tuples. But using my 
typical tool [1] and getting more precise estimations i don't see 
significant profit:

  Limit  (cost=10832.85..10838.69 rows=50 width=21)
    ->  Gather Merge  (cost=10832.85..10838.92 rows=52 width=21)
          Workers Planned: 2
          Workers Launched: 2
          ->  Sort  (cost=9832.83..9832.90 rows=26 width=21)
                Sort Key: objects.id DESC
                Sort Method: top-N heapsort  Memory: 32kB
                Worker 0:  Sort Method: quicksort  Memory: 32kB
                Worker 1:  Sort Method: quicksort  Memory: 32kB
                ->  Parallel Seq Scan on objects
                      Filter: ((hashed SubPlan 1) OR ("userId" = 1))
                      Rows Removed by Filter: 183372
                      SubPlan 1
                        ->  Nested Loop
                              ->  Index Only Scan using users_pkey on
                                    Index Cond: (id = 1)
                                    Heap Fetches: 0
                              ->  Index Only Scan using 
"relationships_followerId_followeeId_idx" on relationships
                                    Index Cond: ("followerId" = 1)
                                    Heap Fetches: 0
  Planning Time: 0.762 ms
  Execution Time: 43.816 ms

  Limit  (cost=10818.83..10819.07 rows=2 width=21)
    ->  Gather Merge  (cost=10818.83..10819.07 rows=2 width=21)
          Workers Planned: 2
          Workers Launched: 2
          ->  Sort  (cost=9818.81..9818.81 rows=1 width=21)
                Sort Key: objects.id DESC
                Sort Method: quicksort  Memory: 25kB
                Worker 0:  Sort Method: quicksort  Memory: 25kB
                Worker 1:  Sort Method: quicksort  Memory: 25kB
                ->  Parallel Seq Scan on objects
                      Filter: ((hashed SubPlan 1) OR ("userId" = 4))
                      Rows Removed by Filter: 183477
                      SubPlan 1
                        ->  Nested Loop  (cost=0.56..8.61 rows=1 width=4)
                              ->  Index Only Scan using 
"relationships_followerId_followeeId_idx" on relationships
                                    Index Cond: ("followerId" = 4)
                                    Heap Fetches: 0
                              ->  Index Only Scan using users_pkey
                                    Index Cond: (id = 4)
                                    Heap Fetches: 0
  Planning Time: 0.646 ms
  Execution Time: 30.824 ms

But this was achieved just because of parallel workers utilisation. 
Disabling them we get:

  Limit  (cost=14635.07..14635.08 rows=2 width=21) (actual 
time=75.941..75.943 rows=0 loops=1)
    ->  Sort  (cost=14635.07..14635.08 rows=2 width=21) (actual 
time=75.939..75.940 rows=0 loops=1)
          Sort Key: objects.id DESC
          Sort Method: quicksort  Memory: 25kB
          ->  Seq Scan on objects  (cost=8.61..14635.06 rows=2 width=21) 
(actual time=75.931..75.932 rows=0 loops=1)
                Filter: ((hashed SubPlan 1) OR ("userId" = 4))
                Rows Removed by Filter: 550430
                SubPlan 1
                  ->  Nested Loop  (cost=0.56..8.61 rows=1 width=4) 
(actual time=0.039..0.040 rows=0 loops=1)
                        ->  Index Only Scan using 
"relationships_followerId_followeeId_idx" on relationships 
(cost=0.28..4.29 rows=1 width=8) (actual time=0.038..0.038 rows=0 loops=1)
                              Index Cond: ("followerId" = 4)
                              Heap Fetches: 0
                        ->  Index Only Scan using users_pkey on users 
(cost=0.29..4.31 rows=1 width=4) (never executed)
                              Index Cond: (id = 4)
                              Heap Fetches: 0
  Planning Time: 0.945 ms
  Execution Time: 76.123 ms

So, from the optimiser's point of view, it has done the best it could.
Theoretically, if you have a big table with indexes and must select a 
small number of tuples, the ideal query plan will include parameterised 
NestLoop JOINs. Unfortunately, parameterisation in PostgreSQL can't pass 
inside a subquery. It could be a reason for new development because 
MSSQL can do such a trick, but it is a long way.
You can try to rewrite your schema and query to avoid subqueries in 
expressions at all.
I hope this message gave you some insights.

[1] https://github.com/postgrespro/aqo

-- 
regards, Andrei Lepikhov




В списке pgsql-performance по дате отправления:

Предыдущее
От: Laura Hausmann
Дата:
Сообщение: Inconsistent query performance based on relation hit frequency
Следующее
От: Andrew Okhmat
Дата:
Сообщение: Re: Row level security