Re: index fragmentation on insert-only table with non-unique column

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: index fragmentation on insert-only table with non-unique column
Дата
Msg-id CAH2-Wzn1i=MN6smuUxaq=oMhMt_5xWVEfL+u5V+nDKcYMyUydQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: index fragmentation on insert-only table with non-unique column  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On Tue, May 24, 2016 at 9:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah.  I wonder what would happen if we used the same rule for index
> insertions.  It would likely make insertions more expensive, but maybe
> not by much.  The existing "randomization" rule for where to insert new
> items in a run of identical index entries would go away, because the
> insertion point would become deterministic.  I am not sure if that's
> good or bad for insertion performance, but it would likely help for
> scan performance.

I think that if somebody tacked on a tie-breaker in the same way as in
tuplesort.c's B-Tree IndexTuple, there'd be significant negative
consequences.

The equal-to-high-key case gets a choice of which page to put the new
IndexTuple on, and I imagine that that's quite useful when it comes
up. I'd also have concerns about the key space in the index. I think
that it would seriously mess with the long term utility of values in
internal pages, which currently can reasonably have little to do with
the values currently stored in leaf pages. They're functionally only
separators of the key space that guide index scans, so it doesn't
matter if the actual values are completely absent from the leaf
pages/the table itself (perhaps some of the values in the internal
pages were inserted years ago, and have long since been deleted and
vacuumed away). Provided the distribution of values at the leaf level
is still well characterized at higher levels (e.g. many string values
that start with vowels, very few that start with the letters 'x' or
'z'), there should be no significant bloat. That's very valuable.
Unique indexes are another problem for this naive approach.

Maybe somebody could do better with a more sophisticated approach, but
it's probably better to focus on duplicate storage or even leaf page
compression, as Stephen mentioned.

--
Peter Geoghegan


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: index fragmentation on insert-only table with non-unique column
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: index fragmentation on insert-only table with non-unique column