Re: Next Steps with Hash Indexes

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Next Steps with Hash Indexes
Дата	27 октября 2021 г. 00:02:18
Msg-id	CA+Tgmoanvcdz4RfpqCb-1pNcf8mGd=3Uw6T68YcSYKnEJWCrfw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Next Steps with Hash Indexes (Simon Riggs <simon.riggs@enterprisedb.com>)
Ответы	Re: Next Steps with Hash Indexes (Amit Kapila <amit.kapila16@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Oct 5, 2021 at 6:50 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> With unique data, starting at 1 and monotonically ascending, hash
> indexes will grow very nicely from 0 to 10E7 rows without causing >1
> overflow block to be allocated for any bucket. This keeps the search
> time for such data to just 2 blocks (bucket plus, if present, 1
> overflow block). The small number of overflow blocks is because of the
> regular and smooth way that splits occur, which works very nicely
> without significant extra latency.

It is my impression that with non-unique data things degrade rather
badly. There's no way to split the buckets that are overflowing
without also splitting the buckets that are completely empty or, in
any event, not full enough to need any overflow pages. I think that's
rather awful.

> The probability of bucket collision while we hold the lock is fairly
> low. This is because even with adjacent data values the hash values
> would be spread across multiple buckets, so we would expect the
> contention to be less than we would get on a monotonically increasing
> btree.
>
> So I don't now see any problem from holding the buffer lwlock on the
> bucket while we do multi-buffer operations.

I don't think that contention is the primary concern here. I think one
major concern is interruptibility: a process must be careful not to
hold lwlocks across long stretches of code, because it cannot be
cancelled while it does. Even if the code is bug-free and the database
has no corruption, long pauses before cancels take effect can be
inconvenient. But as soon as you add in those considerations things
get much worse. Imagine a hash index that is corrupted so that there
is a loop in the list of overflow pages. No matter what, we're going
to go into an infinite loop scanning that bucket, but if we're holding
a buffer lock while we do it, there's no way to escape short of
bouncing the entire server. That's pretty bad.

Undetected deadlock is something to think about, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Arjan van de Ven
Дата: 26 октября 2021 г., 23:58:17
Сообщение: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

Следующее

От: Jeff Davis
Дата: 27 октября 2021 г., 00:03:24
Сообщение: Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Next Steps with Hash Indexes

Предыдущее

Следующее