Re: Parallel Aggregate

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Parallel Aggregate
Дата	16 марта 2016 г. 15:19:27
Msg-id	CAA4eK1+zm7iog8RHSrFC-AZ0qxKPJiTekJAOSwHmW5AG9_nTRQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Aggregate (David Rowley <david.rowley@2ndquadrant.com>)
Ответы	Re: Parallel Aggregate (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, Mar 16, 2016 at 4:19 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
> > I don't think I'd be objecting if you made PartialAggref a real
> > alternative to Aggref. But that's not what you've got here. A
> > PartialAggref is just a wrapper around an underlying Aggref that
> > changes the interpretation of it - and I think that's not a good idea.
> > If you want to have Aggref and PartialAggref as truly parallel node
> > types, that seems cool, and possibly better than what you've got here
> > now. Alternatively, Aggref can do everything. But I don't think we
> > should go with this wrapper concept.
>
> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
> quite happy with how it turned out. I made
> search_indexed_tlist_for_partial_aggref() to follow-on the series of
> other search_indexed_tlist_for_* functions and have made it behave the
> same way, by returning the newly created Var instead of doing that in
> fix_combine_agg_expr_mutator(), as the last version did.
>
> Thanks for the suggestion.
>
> New patch attached.
>

Few assorted comments:

+ * Determine if it's possible to perform aggregation in parallel using

+ * multiple worker processes. We can permit this when there's at least one

+ * partial_path in input_rel, but not if the query has grouping sets,

+ * (although this likely just requires a bit more thought). We must also

+ * ensure that any aggregate functions which are present in either the

+ * target list, or in the HAVING clause all support parallel mode.

+ */

+ can_parallel = false;

+ if ((parse->hasAggs || parse->groupClause != NIL) &&

+ input_rel->partial_pathlist != NIL &&

+ parse->groupingSets == NIL &&

+ root->glob->parallelModeOK)

I think here you need to use has_parallel_hazard() with second parameter as false to ensure expressions are parallel safe. glob->parallelModeOK flag indicates that there is no parallel unsafe expression, but it can still contain parallel restricted expression.

AggPath *

create_agg_path(PlannerInfo *root,

@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,

List *groupClause,

List *qual,

const AggClauseCosts

*aggcosts,

- double numGroups)

+ double numGroups,

bool combineStates,

+ bool finalizeAggs)

Don't you need to set parallel_aware flag in this function as we do for create_seqscan_path()?

postgres=# explain select count(*) from t1;

QUERY PLAN

--------------------------------------------------------------------------------

------

Finalize Aggregate (cost=45420.57..45420.58 rows=1 width=8)

-> Gather (cost=45420.35..45420.56 rows=2 width=8)

Number of Workers: 2

-> Partial Aggregate (cost=44420.35..44420.36 rows=1 width=8)

-> Parallel Seq Scan on t1 (cost=0.00..44107.88 rows=124988 wid

th=0)

(5 rows)

Isn't it better to call it as Parallel Aggregate instead of Partial Aggregate. Initialy, we have kept Partial for seqscan, but later on we changed to Parallel Seq Scan, so I am not able to think why it is better to call Partial incase of Aggregates.

+ * Likewise for any partial paths, although this case is more simple as

* we don't track the cheapest path.

+ */

+ foreach(lc, current_rel->partial_pathlist)

{

+ Path *subpath = (Path *) lfirst(lc);

+ Assert(subpath->param_info ==

NULL);

+ lfirst(lc) = apply_projection_to_path(root, current_rel,

subpath, scanjoin_target);

+ }

Can't we do this by teaching apply_projection_to_path() as done in the latest patch posted by me to push down the target list beneath workers [1].

+ /*

+ * If we find any aggs with an internal transtype then we must ensure

+ * that

pointers to aggregate states are not passed to other processes,

+ * therefore we set the maximum degree

to PAT_INTERNAL_ONLY.

+ */

+ if (aggform->aggtranstype == INTERNALOID)

context->allowedtype = PAT_INTERNAL_ONLY;

In the above comment, you have refered maximum degree which is not making much sense to me. If it is not a typo, then can you clarify the same.

+ * fix_combine_agg_expr

+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of

+ * Aggrefs so

that they references the corresponding Aggref in the subplan.

+ */

+static Node *

+fix_combine_agg_expr(PlannerInfo

*root,

+ Node *node,

+ indexed_tlist *subplan_itlist,

Index newvarno,

+ int rtoffset)

+ fix_upper_expr_context context;

+ context.root =

root;

+ context.subplan_itlist = subplan_itlist;

+ context.newvarno = newvarno;

+ context.rtoffset = rtoffset;

return fix_combine_agg_expr_mutator(node, &context);

+static Node *

+fix_combine_agg_expr_mutator(Node *node,

fix_upper_expr_context *context)

Don't we want to handle the case of context->subplan_itlist->has_non_vars as it is handled in fix_upper_expr_mutator()? If not then, I think adding the reason for same in comments above function would be better.

tlist.c

\ No newline at end of file

There should be a new line at end of file.

[1] - http://www.postgresql.org/message-id/CAA4eK1Jk8hm-2j-CKjvdd0CZTsdPX=EdK_qhzc4689hq0xtfMQ@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления: