Re: Parallel CREATE INDEX for GIN indexes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Parallel CREATE INDEX for GIN indexes
Дата
Msg-id 3b721981-6fa3-4698-a9b6-70b2d8e8fa3b@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Parallel CREATE INDEX for GIN indexes  (Andy Fan <zhihuifan1213@163.com>)
Ответы Re: Parallel CREATE INDEX for GIN indexes
Список pgsql-hackers
On 5/9/24 12:14, Andy Fan wrote:
> 
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> 
>> 3) v20240502-0003-Remove-the-explicit-pg_qsort-in-workers.patch
>>
>> In 0002 the workers still do an explicit qsort() on the TID list before
>> writing the data into the shared tuplesort. But we can do better - the
>> workers can do a merge sort too. To help with this, we add the first TID
>> to the tuplesort tuple, and sort by that too - it helps the workers to
>> process the data in an order that allows simple concatenation instead of
>> the full mergesort.
>>
>> Note: There's a non-obvious issue due to parallel scans always being
>> "sync scans", which may lead to very "wide" TID ranges when the scan
>> wraps around. More about that later.
> 
> This is really amazing.
> 
>> 7) v20240502-0007-Detect-wrap-around-in-parallel-callback.patch
>>
>> There's one more efficiency problem - the parallel scans are required to
>> be synchronized, i.e. the scan may start half-way through the table, and
>> then wrap around. Which however means the TID list will have a very wide
>> range of TID values, essentially the min and max of for the key.
>>
>> Without 0006 this would cause frequent failures of the index build, with
>> the error I already mentioned:
>>
>>   ERROR: could not split GIN page; all old items didn't fit
> 
> I have two questions here and both of them are generall gin index questions
> rather than the patch here.
> 
> 1. What does the "wrap around" mean in the "the scan may start half-way
> through the table, and then wrap around".  Searching "wrap" in
> gin/README gets nothing. 
> 

The "wrap around" is about the scan used to read data from the table
when building the index. A "sync scan" may start e.g. at TID (1000,0)
and read till the end of the table, and then wraps and returns the
remaining part at the beginning of the table for blocks 0-999.

This means the callback would not see a monotonically increasing
sequence of TIDs.

Which is why the serial build disables sync scans, allowing simply
appending values to the sorted list, and even with regular flushes of
data into the index we can simply append data to the posting lists.

> 2. I can't understand the below error.
> 
>>   ERROR: could not split GIN page; all old items didn't fit
> 
> When the posting list is too long, we have posting tree strategy. so in
> which sistuation we could get this ERROR. 
> 

AFAICS the index build relies on the assumption that we only append data
to the TID list on a leaf page, and when the page gets split, the "old"
part will always fit. Which may not be true, if there was a wrap around
and we're adding low TID values to the list on the leaf page.

FWIW the error in dataBeginPlaceToPageLeaf looks like this:

  if (!append || ItemPointerCompare(&maxOldItem, &remaining) >= 0)
    elog(ERROR, "could not split GIN page; all old items didn't fit");

It can fail simply because of the !append part.

I'm not sure why dataBeginPlaceToPageLeaf() relies on this assumption,
or with GIN details in general, and I haven't found any explanation. But
AFAIK this is why the serial build disables sync scans.

>> issue with efficiency - having such a wide TID list forces the mergesort
>> to actually walk the lists, because this wide list overlaps with every
>> other list produced by the worker.
> 
> If we split the blocks among worker 1-block by 1-block, we will have a
> serious issue like here.  If we can have N-block by N-block, and N-block
> is somehow fill the work_mem which makes the dedicated temp file, we
> can make things much better, can we? 
> 

I don't understand the question. The blocks are distributed to workers
by the parallel table scan, and it certainly does not do that block by
block. But even it it did, that's not a problem for this code.

The problem is that if the scan wraps around, then one of the TID lists
for a given worker will have the min TID and max TID, so it will overlap
with every other TID list for the same key in that worker. And when the
worker does the merging, this list will force a "full" merge sort for
all TID lists (for that key), which is very expensive.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Cramer
Дата:
Сообщение: request for database identifier in the startup packet
Следующее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Avoid orphaned objects dependencies, take 3