Re: Improve EXPLAIN output for multicolumn B-Tree Index

Поиск
Список
Период
Сортировка
От Jelte Fennema-Nio
Тема Re: Improve EXPLAIN output for multicolumn B-Tree Index
Дата
Msg-id CAGECzQTe6K4WYexTDzwnzQrkom2P_3k8zo0kKnBNgaAN95yEqQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improve EXPLAIN output for multicolumn B-Tree Index  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Improve EXPLAIN output for multicolumn B-Tree Index  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-hackers
On Fri, 28 Jun 2024 at 00:41, Peter Geoghegan <pg@bowt.ie> wrote:
> Typically, no, it won't be. But there's really no telling for sure.
> The access patterns for a composite index on '(a, b)' with a qual
> "WHERE b = 5" are identical to a qual explicitly written "WHERE a =
> any(<every possible value in 'a'>) AND b = 5".

Hmm, that's true. But in that case the explain plan gives a clear hint
that something like that might be going on, because you'll see:

Index Cond: a = any(<every possible value in 'a'>) AND b = 5

That does make me think of another way, and maybe more "correct" way,
of representing Masahiros situation than adding a new "Skip Scan Cond"
row to the EXPLAIN output. We could explicitly include a comparison to
all prefix columns in the Index Cond:

Index Cond: ((test.id1 = 1) AND (test.id2 = ANY) AND (test.id3 = 101))

Or if you want to go with syntactically correct SQL we could do the following:

Index Cond: ((test.id1 = 1) AND ((test.id2 IS NULL) OR (test.id2 IS
NOT NULL)) AND (test.id3 = 101))

An additional benefit this provides is that you now know which
additional column you should use a more specific filter on to speed up
the query. In this case test.id2

OTOH, maybe it's not a great way because actually running that puts
the IS NULL+ IS NOT NULL query in the Filter clause (which honestly
surprises me because I had expected this "always true expression"
would have been optimized away by the planner).

> EXPLAIN (VERBOSE, ANALYZE) SELECT id1, id2, id3 FROM test WHERE id1 = 1 AND (id2 IS NULL OR id2 IS NOT NULL) AND id3
=101; 
                                                           QUERY PLAN
─────────────────────────────────────────────────────
 Index Only Scan using test_idx on public.test  (cost=0.42..12809.10
rows=1 width=12) (actual time=0.027..11.234 rows=1 loops=1)
   Output: id1, id2, id3
   Index Cond: ((test.id1 = 1) AND (test.id3 = 101))
   Filter: ((test.id2 IS NULL) OR (test.id2 IS NOT NULL))

> What about cases where we legitimately have to vary our strategy
> during the same index scan?

Would my new suggestion above address this?

> In fact, that'd make sense even today, without skip scan (just with
> the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index
> scans, the number of primitive scans is hard to predict, and quite
> indicative of what's really going on with the scan.

*googles nbtree SAOP scans and finds the very helpful[1]*

Yes, I feel like this should definitely be part of the ANALYZE output.
Seeing how Lukas has to look at pg_stat_user_tables to get this
information seems quite annoying[2] and only possible on systems that
have no concurrent queries.

So it sounds like we'd want a "Primitive Index Scans" counter in
ANALYZE too. In addition to the number of filtered rows by, which if
we go with my proposal above should probably be called "Rows Removed
by Index Cond".

[1]: https://www.youtube.com/watch?v=jg2KeSB5DR8
[2]: https://youtu.be/jg2KeSB5DR8?t=188



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: remaining sql/json patches
Следующее
От: shveta malik
Дата:
Сообщение: Re: Conflict Detection and Resolution