Re: Parallel Inserts in CREATE TABLE AS

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Parallel Inserts in CREATE TABLE AS
Дата
Msg-id CALj2ACUK58u-pQXmJjaF97b20v=ugppiwAiwZtgLFXn7SNSY8Q@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Parallel Inserts in CREATE TABLE AS  ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
Ответы Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
On Mon, Jan 11, 2021 at 6:37 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > Attaching v21 patch set, which has following changes:
> > 1) 0001 - changed fpes->ins_cmd_type ==
> > PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
> > PARALLEL_INSERT_CMD_UNDEF
> > 2) 0002 - reworded the commit message.
> > 3) 0003 - added cmin, xmin test case to one of the parallel insert cases
> > to ensure leader and worker insert the tuples in the same xact and replaced
> > memory usage output in numbers like 25kB to NkB to make the tests stable.
> > 4) 0004 - updated one of the test output to be in NkB and made the assertion
> > in SetParallelInsertState to be not under an if condition.
> >
> > There's one open point [1] on selective skipping of error "cannot insert
> > tuples in a parallel worker" in heap_prepare_insert(), thoughts are
> > welcome.
> >
> > Please consider the v21 patch set for further review.
>
> Hi,
>
> I took a look into the new patch and have some comments.

Thanks.

> 1.
> +       /*
> +        * Do not consider tuple cost in case of we intend to perform parallel
> +        * inserts by workers. We would have turned on the ignore flag in
> +        * apply_scanjoin_target_to_paths before generating Gather path for the
> +        * upper level SELECT part of the query.
> +        */
> +       if ((root->parse->parallelInsCmdTupleCostOpt &
> +                PARALLEL_INSERT_SELECT_QUERY) &&
> +               (root->parse->parallelInsCmdTupleCostOpt &
> +                PARALLEL_INSERT_CAN_IGN_TUP_COST))
>
> Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
> IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set.

+1. Changed.

> 2.
> +static void
> +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
> +                                          void *ins_info)
> ...
> +               info = (ParallelInsertCTASInfo *) ins_info;
> +               intoclause_str = nodeToString(info->intoclause);
> +               intoclause_len = strlen(intoclause_str) + 1;
>
> +static void
> +SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
> +                                          void *ins_info)
> ...
> +               info = (ParallelInsertCTASInfo *)ins_info;
> +               intoclause_str = nodeToString(info->intoclause);
> +               intoclause_len = strlen(intoclause_str) + 1;
> +               intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
>
> I noticed the above code will call nodeToString and strlen twice which seems unnecessary.
> Do you think it's better to store the result of nodetostring and strlen first and pass them when used ?

I wanted to keep the API generic, not do nodeToString, strlen outside
and pass it to the APIs. I don't think it will add too much function
call cost since it's run only in the leader.  This way, the code and
API looks more readable. Thoughts?

> 3.
> +       if (node->need_to_scan_locally || node->nworkers_launched == 0)
> +       {
> +               EState     *estate = node->ps.state;
> +               TupleTableSlot *outerTupleSlot;
> +
> +               for(;;)
> +               {
> +                       /* Install our DSA area while executing the plan. */
> +                       estate->es_query_dsa =
> +                                       node->pei ? node->pei->area : NULL;
> ...
> +                       node->ps.state->es_processed++;
> +               }
>
> How about use the variables estate like 'estate-> es_processed++;'
> Instead of node->ps.state->es_processed++;

+1. Changed.

Attaching v22 patch set with changes only in 0001 and 0002. Please
consider it for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: pg_preadv() and pg_pwritev()
Следующее
От: Thomas Munro
Дата:
Сообщение: O(n^2) system calls in RemoveOldXlogFiles()