Обсуждение: Strange result with LATERAL query
Hi,
While playing with LATERAL along with some aggregates in sub-query, I have
observed somewhat unusual behavior.
Consider following steps:
create table tab1(c1 int, c2 int);
insert into tab1 select id, 1 from generate_series(1, 3) id;
create function sum_tab1(extra int) returns setof bigint as $$
select sum(c1 + extra) sum from tab1 group by c1
$$ language sql;
While playing with LATERAL along with some aggregates in sub-query, I have
observed somewhat unusual behavior.
Consider following steps:
create table tab1(c1 int, c2 int);
insert into tab1 select id, 1 from generate_series(1, 3) id;
create function sum_tab1(extra int) returns setof bigint as $$
select sum(c1 + extra) sum from tab1 group by c1
$$ language sql;
-- This gives wrong output
select c1, c2, sum from tab1 t1, lateral
(select sum(t2.c1 + t1.c1) sum from tab1 t2 group by t2.c1) qry
order by 1, 2, 3;
(select sum(t2.c1 + t1.c1) sum from tab1 t2 group by t2.c1) qry
order by 1, 2, 3;
-- This gives correct output
select c1, c2, sum from tab1 t1, lateral
(select sum_tab1 sum from sum_tab1(c1)) qry
order by 1, 2, 3;
I would expect same result from both these queries, but unfortunately not.
Per my understanding, second query involving function gives me correct result
where as first query's output seems wrong.
Is this an expected behavior OR we are giving wrong result in case of first
query?
Thanks(select sum_tab1 sum from sum_tab1(c1)) qry
order by 1, 2, 3;
I would expect same result from both these queries, but unfortunately not.
Per my understanding, second query involving function gives me correct result
where as first query's output seems wrong.
Is this an expected behavior OR we are giving wrong result in case of first
query?
--
Jeevan B Chalke
>>>>> "Jeevan" == Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes: Jeevan> Hi,Jeevan> While playing with LATERAL along with some aggregates inJeevan> sub-query, I have observed somewhat unusualbehavior. Simpler example not needing LATERAL: select array(select sum(x+y) from generate_series(1,3) y group by y) from generate_series(1,3) x;?column? ----------{2,4,3}{2,4,3}{2,4,3} (3 rows) This is broken all the way back, it's not a recent bug. Something is wrong with the way chgParam is being handled in Agg nodes. The code in ExecReScanAgg seems to assume that if the lefttree doesn't have any parameter changes then it suffices to re-project the data from the existing hashtable; but of course this is nonsense if the parameter is in an input to an aggregate function. -- Andrew (irc:RhodiumToad)
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >>>>> "Jeevan" == Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes: Jeevan> Hi,Jeevan> While playing with LATERAL along with some aggregates inJeevan> sub-query, I have observed somewhat unusualbehavior. Andrew> Simpler example not needing LATERAL: Andrew> select array(select sum(x+y) from generate_series(1,3) y group by y)Andrew> from generate_series(1,3) x; Oh, and another way to verify that it's a bug and not expected behavior is that it goes away with set enable_hashagg=false; -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > Something is wrong with the way chgParam is being handled in Agg nodes. > The code in ExecReScanAgg seems to assume that if the lefttree doesn't > have any parameter changes then it suffices to re-project the data from > the existing hashtable; but of course this is nonsense if the parameter > is in an input to an aggregate function. It looks like it's sufficient to do this: diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 1ec2515..f468fad 100644 *** a/src/backend/executor/nodeAgg.c --- b/src/backend/executor/nodeAgg.c *************** ExecReScanAgg(AggState *node) *** 3425,3435 **** return; /* ! * If we do have the hash table and the subplan does not have any ! * parameter changes, then we can just rescan the existing hash table; ! * no need to build it again. */ ! if (outerPlan->chgParam == NULL) { ResetTupleHashIterator(node->hashtable, &node->hashiter); return; --- 3425,3436 ---- return; /* ! * If we do have the hash table and there are no relevant parameter ! * changes, then we can just rescan the existing hash table; no need ! * to build it again. */ ! if (node->ss.ps.chgParam == NULL && ! outerPlan->chgParam == NULL) { ResetTupleHashIterator(node->hashtable, &node->hashiter); return; I'm not sure if it's worth trying to distinguish whether the Param is inside any aggregate calls or not. The existing code gets the right answer for select array(select x+sum(y) from generate_series(1,3) y group by y) from generate_series(1,3) x; and we'd be losing some efficiency for cases like that if we fix it as above. But is it worth the trouble? regards, tom lane
2016-08-24 17:08 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Something is wrong with the way chgParam is being handled in Agg nodes.
> The code in ExecReScanAgg seems to assume that if the lefttree doesn't
> have any parameter changes then it suffices to re-project the data from
> the existing hashtable; but of course this is nonsense if the parameter
> is in an input to an aggregate function.
It looks like it's sufficient to do this:
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/ nodeAgg.c
index 1ec2515..f468fad 100644
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*************** ExecReScanAgg(AggState *node)
*** 3425,3435 ****
return;
/*
! * If we do have the hash table and the subplan does not have any
! * parameter changes, then we can just rescan the existing hash table;
! * no need to build it again.
*/
! if (outerPlan->chgParam == NULL)
{
ResetTupleHashIterator(node->hashtable, &node->hashiter);
return;
--- 3425,3436 ----
return;
/*
! * If we do have the hash table and there are no relevant parameter
! * changes, then we can just rescan the existing hash table; no need
! * to build it again.
*/
! if (node->ss.ps.chgParam == NULL &&
! outerPlan->chgParam == NULL)
{
ResetTupleHashIterator(node->hashtable, &node->hashiter);
return;
I'm not sure if it's worth trying to distinguish whether the Param is
inside any aggregate calls or not. The existing code gets the right
answer for
select array(select x+sum(y) from generate_series(1,3) y group by y)
from generate_series(1,3) x;
and we'd be losing some efficiency for cases like that if we fix
it as above. But is it worth the trouble?
The result should not depend on GUC - hashagg on/off changing output - it is error.
Regards
Pavel
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> I'm not sure if it's worth trying to distinguish whether the ParamTom> is inside any aggregate calls or not. The existingcode gets theTom> right answer for Tom> select array(select x+sum(y) from generate_series(1,3) y group by y) Tom> from generate_series(1,3) x; Tom> and we'd be losing some efficiency for cases like that if we fixTom> it as above. But is it worth the trouble? The loss of efficiency could be significant, since it's forcing a rescan of what could be an expensive plan. -- Andrew (irc:RhodiumToad)
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> I'm not sure if it's worth trying to distinguish whether the Param Tom> is inside any aggregate calls or not. How about: -- Andrew (irc:RhodiumToad) diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 1ec2515..4a9fefb 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -3426,10 +3426,11 @@ ExecReScanAgg(AggState *node) /* * If we do have the hash table and the subplan does not have any - * parameter changes, then we can just rescan the existing hash table; - * no need to build it again. + * parameter changes, we might still need to rebuild the hash if any of + * the parameter changes affect the input to aggregate functions. */ - if (outerPlan->chgParam == NULL) + if (outerPlan->chgParam == NULL + && !bms_overlap(node->ss.ps.chgParam, aggnode->aggParam)) { ResetTupleHashIterator(node->hashtable, &node->hashiter); return; diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index c7a0644..7b5eb86 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -871,6 +871,7 @@ _copyAgg(const Agg *from) COPY_SCALAR_FIELD(aggstrategy); COPY_SCALAR_FIELD(aggsplit); COPY_SCALAR_FIELD(numCols); + COPY_BITMAPSET_FIELD(aggParam); if (from->numCols > 0) { COPY_POINTER_FIELD(grpColIdx, from->numCols * sizeof(AttrNumber)); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 1fab807..5e48edd 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -706,6 +706,7 @@ _outAgg(StringInfo str, const Agg *node) WRITE_ENUM_FIELD(aggstrategy, AggStrategy); WRITE_ENUM_FIELD(aggsplit, AggSplit); WRITE_INT_FIELD(numCols); + WRITE_BITMAPSET_FIELD(aggParam); appendStringInfoString(str, " :grpColIdx"); for (i = 0; i < node->numCols; i++) diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index c83063e..67dcf17 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -2004,6 +2004,7 @@ _readAgg(void) READ_ENUM_FIELD(aggstrategy, AggStrategy); READ_ENUM_FIELD(aggsplit, AggSplit); READ_INT_FIELD(numCols); + READ_BITMAPSET_FIELD(aggParam); READ_ATTRNUMBER_ARRAY(grpColIdx, local_node->numCols); READ_OID_ARRAY(grpOperators, local_node->numCols); READ_LONG_FIELD(numGroups); diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index a46cc10..2e56484 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -82,6 +82,7 @@ static Bitmapset *finalize_plan(PlannerInfo *root, Bitmapset *valid_params, Bitmapset *scan_params); static bool finalize_primnode(Node *node, finalize_primnode_context *context); +static bool finalize_agg_primnode(Node *node, finalize_primnode_context *context); /* @@ -2659,8 +2660,30 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params, &context); break; - case T_Hash: case T_Agg: + { + Agg *agg = (Agg *) plan; + finalize_primnode_context aggcontext; + + /* + * We need to know which params are referenced in inputs to + * aggregate calls, so that we know whether we need to rebuild + * the hashtable in the AGG_HASHED case. Rescan the tlist and + * qual for this purpose. + */ + + aggcontext = context; + aggcontext.paramids = NULL; + + finalize_agg_primnode((Node *) agg->plan.targetlist, &aggcontext); + finalize_agg_primnode((Node *) agg->plan.qual, &aggcontext); + + /* remember results for execution */ + agg->aggParam = aggcontext.paramids; + } + break; + + case T_Hash: case T_Material: case T_Sort: case T_Unique: @@ -2812,6 +2835,24 @@ finalize_primnode(Node *node, finalize_primnode_context *context) } /* + * finalize_agg_primnode: add IDs of all PARAM_EXEC params appearing inside + * Aggref nodes in the given expression tree to the result set. + */ +static bool +finalize_agg_primnode(Node *node, finalize_primnode_context *context) +{ + if (node == NULL) + return false; + if (IsA(node, Aggref)) + { + finalize_primnode(node, context); + return false; /* no more to do here */ + } + return expression_tree_walker(node, finalize_agg_primnode, + (void *) context); +} + +/* * SS_make_initplan_output_param - make a Param for an initPlan's output * * The plan is expected to return a scalar value of the given type/collation. diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index 369179f..3d5e4df 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -712,6 +712,7 @@ typedef struct Agg AggStrategy aggstrategy; /* basic strategy, see nodes.h */ AggSplit aggsplit; /* agg-splitting mode, see nodes.h */ int numCols; /* number of grouping columns */ + Bitmapset *aggParam; /* params used in Aggref inputs */ AttrNumber *grpColIdx; /* their indexes in the target list */ Oid *grpOperators; /* equality operators to compare with */ long numGroups; /* estimated number of groups in input */
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes: Pavel> The result should not depend on GUC - hashagg on/off changingPavel> output - it is error. I don't think anyone's suggesting leaving it unfixed, just whether the fix should introduce unnecessary rescans of the aggregate input. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > How about: Hm, I was just working on inserting something of the sort into ExecInitAgg. But I guess we could do it in the planner too. Will run with your approach. I think it's a bit too stupid as-is, though. We don't need to recalculate for Params in aggdirectargs, do we? regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Hm, I was just working on inserting something of the sort intoTom> ExecInitAgg. But I guess we could do it in the plannertoo. WillTom> run with your approach. Tom> I think it's a bit too stupid as-is, though. We don't need toTom> recalculate for Params in aggdirectargs, do we? In theory we would need to. But in practice we don't, because we don't allow ordered aggs in AGG_HASHED mode anyway. We could skip filling in aggParam at all if not in AGG_HASHED mode I guess. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> I think it's a bit too stupid as-is, though. We don't need to > Tom> recalculate for Params in aggdirectargs, do we? > In theory we would need to. How come? Those are only passed to the final function, no? They shouldn't affect what is in the hash table. > But in practice we don't, because we don't > allow ordered aggs in AGG_HASHED mode anyway. True, it's moot at the moment. > We could skip filling in aggParam at all if not in AGG_HASHED mode I > guess. Yeah, I'm planning to make it do that. regards, tom lane