Re: Use of additional index columns in rows filtering

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Use of additional index columns in rows filtering
Дата
Msg-id CAH2-WzmUEzjAvVfjqRYAE0j7YCgVh5ogKJ4GKopXjSYm55SFHQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Use of additional index columns in rows filtering  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Use of additional index columns in rows filtering  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I don't dispute the fact that this can only happen when the planner
> believes (with good reason) that the expected cost will be lower. But
> I maintain that there is a novel risk to be concerned about, which is
> meaningfully distinct from the general risk of regressions that comes
> from making just about any change to the planner. The important
> principle here is that we should have a strong bias in the direction
> of making quals into true "Access Predicates" whenever practical.
>
> Yeah, technically the patch doesn't directly disrupt how existing
> index paths get generated. But it does have the potential to disrupt
> it indirectly, by providing an alternative very-similar index path
> that's likely to outcompete the existing one in these cases. I think
> that we should have just one index path that does everything well
> instead.

You can see this for yourself, quite easily. Start by running the
relevant query from the regression tests, which is:

SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous
= 3 OR tenthous = 42);

EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query
slightly faster, as expected. I see 7 buffer hits for the bitmap index
scan plan on master, versus only 4 buffer hits for the patch's index
scan. Obviously, this is because we go from multiple index scans
(multiple bitmap index scans) to only one.

But, if I run this insert statement and try the same thing again,
things look very different:

insert into tenk1 (thousand, tenthous) select 42, i from
generate_series(43, 1000) i;

(Bear in mind that we've inserted rows that don't actually need to be
selected by the query in question.)

Now the master branch's plan works in just the same way as before --
it has exactly the same overhead (7 buffer hits). Whereas the patch
still gets the same risky plan -- which now blows up. The plan now
loses by far more than it could ever really hope to win by: 336 buffer
hits. (It could be a lot higher than this, even, but you get the
picture.)

Sure, it's difficult to imagine a very general model that captures
this sort of risk, in the general case. But you don't need a degree in
actuarial science to understand that it's inherently a bad idea to
juggle chainsaws -- no matter what your general risk tolerance happens
to be. Some things you just don't do.

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amul Sul
Дата:
Сообщение: Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression
Следующее
От: Yuya Watari
Дата:
Сообщение: Re: [PoC] Reducing planning time when tables have many partitions