Re: Parallel Aggregate

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Parallel Aggregate
Дата
Msg-id CAA4eK1+zm7iog8RHSrFC-AZ0qxKPJiTekJAOSwHmW5AG9_nTRQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Aggregate  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Parallel Aggregate  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Mar 16, 2016 at 4:19 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
> > I don't think I'd be objecting if you made PartialAggref a real
> > alternative to Aggref.  But that's not what you've got here.  A
> > PartialAggref is just a wrapper around an underlying Aggref that
> > changes the interpretation of it - and I think that's not a good idea.
> > If you want to have Aggref and PartialAggref as truly parallel node
> > types, that seems cool, and possibly better than what you've got here
> > now.  Alternatively, Aggref can do everything.  But I don't think we
> > should go with this wrapper concept.
>
> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
> quite happy with how it turned out. I made
> search_indexed_tlist_for_partial_aggref() to follow-on the series of
> other search_indexed_tlist_for_* functions and have made it behave the
> same way, by returning the newly created Var instead of doing that in
> fix_combine_agg_expr_mutator(), as the last version did.
>
> Thanks for the suggestion.
>
> New patch attached.
>

Few assorted comments:

1.
  /*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We must also
+ * ensure that any aggregate functions which are present in either the
+ * target list, or in the HAVING clause all support parallel mode.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK)

I think here you need to use has_parallel_hazard() with second parameter as false to ensure expressions are parallel safe. glob->parallelModeOK flag indicates that there is no parallel unsafe expression, but it can still contain parallel restricted expression.


2.
 AggPath *
 create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
 
List *groupClause,
  List *qual,
  const AggClauseCosts 
*aggcosts,
- double numGroups)
+ double numGroups,
+
bool combineStates,
+ bool finalizeAggs)

Don't you need to set parallel_aware flag in this function as we do for create_seqscan_path()?

3.
postgres=# explain select count(*) from t1;
                                      QUERY PLAN

--------------------------------------------------------------------------------
------
 Finalize Aggregate  (cost=45420.57..45420.58 rows=1 width=8)
   ->  Gather  (cost=45420.35..45420.56 rows=2 width=8)
         Number of Workers: 2
         ->  Partial Aggregate  (cost=44420.35..44420.36 rows=1 width=8)
               ->  Parallel Seq Scan on t1  (cost=0.00..44107.88 rows=124988 wid
th=0)
(5 rows)

Isn't it better to call it as Parallel Aggregate instead of Partial Aggregate.  Initialy, we have kept Partial for seqscan, but later on we changed to Parallel Seq Scan, so I am not able to think why it is better to call Partial incase of Aggregates.

4.
  /*
+ * Likewise for any partial paths, although this case is more simple as
+
 * we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+
{
+ Path   *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info == 
NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+
subpath, scanjoin_target);
+ }
+


Can't we do this by teaching apply_projection_to_path() as done in the latest patch posted by me to push down the target list beneath workers [1].

5.
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that 
pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree 
to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+
context->allowedtype = PAT_INTERNAL_ONLY;

In the above comment, you have refered maximum degree which is not making much sense to me. If it is not a typo, then can you clarify the same.

6.
+ * fix_combine_agg_expr
+ *  Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ *  Aggrefs so 
that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo 
*root,
+   Node *node,
+   indexed_tlist *subplan_itlist,
+   
Index newvarno,
+   int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root = 
root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+
return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node, 
fix_upper_expr_context *context)

Don't we want to handle the case of context->subplan_itlist->has_non_vars as it is handled in fix_upper_expr_mutator()?  If not then, I think adding the reason for same in comments above function would be better.

7.
tlist.c

+}
\ No newline at end of file

There should be a new line at end of file.




With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Valery Popov
Дата:
Сообщение: Re: Password identifiers, protocol aging and SCRAM protocol
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Parallel Aggregate