Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

Поиск
Список
Период
Сортировка
От Lukas Fittl
Тема Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Дата
Msg-id CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g@mail.gmail.com
обсуждение исходный текст
Ответы Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Hi,

I was debugging a planner problem on Postgres 14.4 the other day - and the involved "bad" plan was including Memoize - though I don't necessarily think that Memoize is to blame (and this isn't any of the problems recently fixed in Memoize costing).

However, what I noticed whilst trying different ways to fix the plan, is that the Memoize output was a bit hard to reason about - especially since the plan involving Memoize was expensive to run, and so I was mostly running EXPLAIN without ANALYZE to look at the costing.

Here is an example of the output I was looking at:

     ->  Nested Loop  (cost=1.00..971672.56 rows=119623 width=0)
           ->  Index Only Scan using table1_idx on table1  (cost=0.43..372676.50 rows=23553966 width=8)
           ->  Memoize  (cost=0.57..0.61 rows=1 width=8)
                 Cache Key: table1.table2_id
                 Cache Mode: logical
                 ->  Index Scan using table2_idx on table2  (cost=0.56..0.60 rows=1 width=8)
                       Index Cond: (id = table1.table2_id)

The other plan I was comparing with (that I wanted the planner to choose instead), had a total cost of 1,451,807.35 -- and so I was trying to figure out why the Nested Loop was costed as 971,672.56.

Simple math makes me expect the Nested Loop should roughly have a total cost of14,740,595.76 here (372,676.50 + 23,553,966 * 0.61), ignoring a lot of the smaller costs. Thus, in this example, it appears Memoize made the plan cost significantly cheaper (roughly 6% of the regular cost).

Essentially this comes down to the "cost reduction" performed by Memoize only being implicitly visible in the Nested Loop's total cost - and with nothing useful on the Memoize node itself - since the rescan costs are not shown.

I think explicitly adding the estimated cache hit ratio for Memoize nodes might make this easier to reason about, like this:

->  Memoize  (cost=0.57..0.61 rows=1 width=8)
     Cache Key: table1.table2_id
     Cache Mode: logical
     Cache Hit Ratio Estimated: 0.94

Alternatively (or in addition) we could consider showing the "ndistinct" value that is calculated in cost_memoize_rescan - since that's the most significant contributor to the cache hit ratio (and you can influence that directly by improving the ndistinct statistics).

See attached a patch that implements showing the cache hit ratio as a discussion starter.

I'll park this in the July commitfest for now.

Thanks,
Lukas

--
Lukas Fittl
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Request for comment on setting binary format output per session
Следующее
От: Dave Cramer
Дата:
Сообщение: Re: Request for comment on setting binary format output per session