On Mon, Jan 11, 2021 at 6:37 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > Attaching v21 patch set, which has following changes:
> > 1) 0001 - changed fpes->ins_cmd_type ==
> > PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
> > PARALLEL_INSERT_CMD_UNDEF
> > 2) 0002 - reworded the commit message.
> > 3) 0003 - added cmin, xmin test case to one of the parallel insert cases
> > to ensure leader and worker insert the tuples in the same xact and replaced
> > memory usage output in numbers like 25kB to NkB to make the tests stable.
> > 4) 0004 - updated one of the test output to be in NkB and made the assertion
> > in SetParallelInsertState to be not under an if condition.
> >
> > There's one open point [1] on selective skipping of error "cannot insert
> > tuples in a parallel worker" in heap_prepare_insert(), thoughts are
> > welcome.
> >
> > Please consider the v21 patch set for further review.
>
> Hi,
>
> I took a look into the new patch and have some comments.
Thanks.
> 1.
> + /*
> + * Do not consider tuple cost in case of we intend to perform parallel
> + * inserts by workers. We would have turned on the ignore flag in
> + * apply_scanjoin_target_to_paths before generating Gather path for the
> + * upper level SELECT part of the query.
> + */
> + if ((root->parse->parallelInsCmdTupleCostOpt &
> + PARALLEL_INSERT_SELECT_QUERY) &&
> + (root->parse->parallelInsCmdTupleCostOpt &
> + PARALLEL_INSERT_CAN_IGN_TUP_COST))
>
> Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
> IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set.
+1. Changed.
> 2.
> +static void
> +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
> + void *ins_info)
> ...
> + info = (ParallelInsertCTASInfo *) ins_info;
> + intoclause_str = nodeToString(info->intoclause);
> + intoclause_len = strlen(intoclause_str) + 1;
>
> +static void
> +SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
> + void *ins_info)
> ...
> + info = (ParallelInsertCTASInfo *)ins_info;
> + intoclause_str = nodeToString(info->intoclause);
> + intoclause_len = strlen(intoclause_str) + 1;
> + intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
>
> I noticed the above code will call nodeToString and strlen twice which seems unnecessary.
> Do you think it's better to store the result of nodetostring and strlen first and pass them when used ?
I wanted to keep the API generic, not do nodeToString, strlen outside
and pass it to the APIs. I don't think it will add too much function
call cost since it's run only in the leader. This way, the code and
API looks more readable. Thoughts?
> 3.
> + if (node->need_to_scan_locally || node->nworkers_launched == 0)
> + {
> + EState *estate = node->ps.state;
> + TupleTableSlot *outerTupleSlot;
> +
> + for(;;)
> + {
> + /* Install our DSA area while executing the plan. */
> + estate->es_query_dsa =
> + node->pei ? node->pei->area : NULL;
> ...
> + node->ps.state->es_processed++;
> + }
>
> How about use the variables estate like 'estate-> es_processed++;'
> Instead of node->ps.state->es_processed++;
+1. Changed.
Attaching v22 patch set with changes only in 0001 and 0002. Please
consider it for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com