Re: Parallel Inserts in CREATE TABLE AS

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Parallel Inserts in CREATE TABLE AS
Дата	29 мая 2021 г. 07:16:35
Msg-id	CAA4eK1+skEe12C+7aDkd+D2UwuNy_OnVo2-r4_bJkNxSV3G0vQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Inserts in CREATE TABLE AS (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, May 28, 2021 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, May 27, 2021 at 7:37 PM Bharath Rupireddy
> >
> > I captured below information with the attached patch
> > 0001-test-times-and-block-counts.patch applied on top of CTAS v23
> > patch set. Testing details are attached in the file named "test".
> > Total time spent in LockRelationForExtension
> > Total time spent in GetPageWithFreeSpace
> > Total time spent in RelationAddExtraBlocks
> > Total number of times extended the relation in bulk
> > Total number of times extended the relation by one block
> > Total number of blocks added in bulk extension
> > Total number of times getting the page from FSM
> >
>
> In your results, the number of pages each process is getting from FSM
> is not matching with the number of blocks added. I think we need to
> increment 'fsm_hit_count' in RecordAndGetPageWithFreeSpace as well
> because that is also called and the process can get a free page via
> the same. The other thing to check via debugger is when one worker
> adds the blocks in bulk does another parallel worker gets all those
> blocks. You can achieve that by allowing one worker (say worker-1) to
> extend the relation in bulk and then let it wait and allow another
> worker (say worker-2) to proceed and see if it gets all the pages
> added by worker-1 from FSM. You need to keep the leader also waiting
> or not perform any operation.
>

While looking at results, I have observed one more thing that we are
trying to parallelize I/O due to which we might not be seeing benefit
in such cases. I think even for non-write queries there won't be any
(much) benefit if we can't parallelize CPU usage. Basically, the test
you are doing is for statement: explain analyze verbose create table
test as select * from tenk1;. Now, in this statement, there is no
qualification and still, the Gather node is generated for it, this
won't be the case if we check "select * from tenk1". Is it due to the
reason that the patch completely ignores the parallel_tuple_cost? But
still, it should prefer a serial plan due parallel_setup_cost, why is
that not happening? Anyway, I think we should not parallelize such
queries where we can't parallelize CPU usage. Have you tried the cases
without changing any of the costings for parallelism?

-- 
With Regards,
Amit Kapila.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Andy Fan
Дата: 29 мая 2021 г., 06:23:31
Сообщение: Regarding the necessity of RelationGetNumberOfBlocks for every rescan / bitmap heap scan.

Следующее

От: Amit Kapila
Дата: 29 мая 2021 г., 07:29:52
Сообщение: Re: Decoding speculative insert with toast leaks memory

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel Inserts in CREATE TABLE AS

Предыдущее

Следующее