plan_create_index_workers doesn't account for TOAST

Поиск
Список
Период
Сортировка
От Jonathan S. Katz
Тема plan_create_index_workers doesn't account for TOAST
Дата
Msg-id ad8a178f-bbe7-d89d-b407-2f0fede93144@postgresql.org
обсуждение исходный текст
Ответы Re: plan_create_index_workers doesn't account for TOAST  ("Jonathan S. Katz" <jkatz@postgresql.org>)
Список pgsql-hackers
Hi,

plan_create_index_workers[1] does not consider the amount of tuples 
existing in TOAST pages when determining the number of parallel workers 
to use for a build. The estimation comes from estimate_rel_size[2], 
which in this case, will just take the value from rel->rd_rel->relpages.

We probably don't notice this much with B-trees, given a B-tree is 
typically used for data that does not require toasting. However, this 
becomes more visible when working on custom index access methods that 
implement their own parallel build strategy.

For example, pgvector[3] provides its own data types and index access 
method for indexing vector data. Vectors can get quite large fairly 
quickly, e.g. a 768-dimensional vector takes up 8 + 4*768 = 3080 bytes 
on disk, which quickly clears the default TOAST tuple threshold.

In a recent patch proposal to allow for building indexes in parallel[4], 
I performed a few experiments on how many parallel workers would be 
spawned when indexing 1,000,000 (1MM) 768-dim vectors, both with 
EXTEDNED (default) and PLAIN storage. In all cases, I allowed for leader 
participation, but the leader is not considered in 
plan_create_index_workers.

With EXTENDED, plan_create_index_workers recommended 2 workers. The 
build time was ~2x faster than the serial build.

With PLAIN, plan_create_index_workers recommended 4 workers. The build 
time was **~3X faster** than the serial build.

(I've been doing more detailed, less hand-waivy performance testing, but 
I wanted to provide directional numbers here)

It seems like we're leaving some performance for columns with TOASTed 
data that require indexing, so I wanted to propose allowing the pages in 
TOASTed tables to be considered when we're trying to index a column with 
TOASTed attributes.

Thanks,

Jonathan

[1] 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/optimizer/plan/planner.c;hb=refs/heads/master#l6734
[2] 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/optimizer/util/plancat.c;hb=refs/heads/master#l1117
[3] https://github.com/pgvector/pgvector
[4] https://github.com/pgvector/pgvector/commits/parallel-index-build

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Álvaro Herrera
Дата:
Сообщение: Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?
Следующее
От: Alena Rybakina
Дата:
Сообщение: Re: POC, WIP: OR-clause support for indexes