Обсуждение: More on GROUP BY
While looking at all these parsetrees I wonder why the hell the GroupClause contains a complete copy of the TLE at all? The planner depends on finding a corresponding entry in the targetlist which should contain the same expression. At least it needs an equal junk TLE. For the query SELECT a, b FROM t1 GROUP BY b + 1; the parser in fact creates 3 TLE's where the last one is a junk result named "resjunk" for the "b + 1" expression and the GroupClause contains a totally equal TLE. Could someone explain that please? Wouldn't it be better to have another field (resgroupno e.g.) in the resdom which the GroupClause can reference? Then changing the resno's or even replacing the entire expression wouldn't hurt because make_subplanTargetList() could match them this way and the expressions for the subplans can be pulled out directly from the targetlist. And it would save processing the group clauses in the rewriting because they cannot contain Var nodes anymore and the entire list can be ignored. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> While looking at all these parsetrees I wonder why the hell > the GroupClause contains a complete copy of the TLE at all? > The planner depends on finding a corresponding entry in the > targetlist which should contain the same expression. At least > it needs an equal junk TLE. For the query > > SELECT a, b FROM t1 GROUP BY b + 1; > > the parser in fact creates 3 TLE's where the last one is a > junk result named "resjunk" for the "b + 1" expression and > the GroupClause contains a totally equal TLE. > > Could someone explain that please? > > Wouldn't it be better to have another field (resgroupno e.g.) > in the resdom which the GroupClause can reference? Then > changing the resno's or even replacing the entire expression > wouldn't hurt because make_subplanTargetList() could match > them this way and the expressions for the subplans can be > pulled out directly from the targetlist. And it would save > processing the group clauses in the rewriting because they > cannot contain Var nodes anymore and the entire list can be > ignored. I think I can comment on this. Aggregates had the similar problem. It was so long ago, I don't remember the solution, but it was a pain to keep the aggs up-to-date with the target list and varno changes. If you think a redesign will fix the problem, go ahead. I think the old problem may have been that the old Aggreg's kept pointers to matching target list entries, so there was the aggregate in the target list, and another separate list of aggregates in the Query structure. I think I removed the second copy, and just generated it in the executor, where it was needed. Please see parser/parse_agg.c for a description of how count(*) is handled differently than other aggregates. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
jwieck@debis.com (Jan Wieck) writes: > While looking at all these parsetrees I wonder why the hell > the GroupClause contains a complete copy of the TLE at all? > The planner depends on finding a corresponding entry in the > targetlist which should contain the same expression. At least > it needs an equal junk TLE. For the query > SELECT a, b FROM t1 GROUP BY b + 1; > the parser in fact creates 3 TLE's where the last one is a > junk result named "resjunk" for the "b + 1" expression and > the GroupClause contains a totally equal TLE. > Could someone explain that please? All true, but so what? It wastes a few bytes of memory during planning, I suppose... > Wouldn't it be better to have another field (resgroupno e.g.) > in the resdom which the GroupClause can reference? Then > changing the resno's or even replacing the entire expression > wouldn't hurt because make_subplanTargetList() could match > them this way and the expressions for the subplans can be > pulled out directly from the targetlist. And it would save > processing the group clauses in the rewriting because they > cannot contain Var nodes anymore and the entire list can be > ignored. I think I like better the idea of leaving the representation alone, but using equal() on the exprs to match groupclause items to targetlist entries. That way, manipulation of the targetlist can't accidentally cause the grouplist to look like it contains something different than what it should have. It doesn't bother me that the planner can fail if it is unable to match a group item to a targetlist item --- that's a good crosscheck that nothing's gone wrong. (But matching on just the resno is unsafe, as you said before.) I think it's true that using a TLE for each grouplist item is a waste of space, and that representing the grouplist as simply a list of expr's would be good enough. But pulling out the TLE decoration seems like it's not an appropriate use of time at this stage of the release cycle. I'd say hold off till after 6.5, then fold it in with the parsetree redesign that you keep muttering we need (I agree!). BTW, you keep using the term "RTE" which I'm not familiar with --- I assume it's just referring to the parse tree nodes? regards, tom lane
> > Wouldn't it be better to have another field (resgroupno e.g.) > > in the resdom which the GroupClause can reference? Then > > changing the resno's or even replacing the entire expression > > wouldn't hurt because make_subplanTargetList() could match > > them this way and the expressions for the subplans can be > > pulled out directly from the targetlist. And it would save > > processing the group clauses in the rewriting because they > > cannot contain Var nodes anymore and the entire list can be > > ignored. > > I think I like better the idea of leaving the representation alone, > but using equal() on the exprs to match groupclause items to targetlist > entries. That way, manipulation of the targetlist can't accidentally > cause the grouplist to look like it contains something different than > what it should have. It doesn't bother me that the planner can fail > if it is unable to match a group item to a targetlist item --- that's > a good crosscheck that nothing's gone wrong. (But matching on just > the resno is unsafe, as you said before.) The real problem is that it is hard to keep all these items synchronized as they pass around through the stages. I looked at the aggregates, and it looks like that has a separate copy too, so it may not be that bad. We may just be missing a pass that makes appropriate changes in GROUP clauses, but I am not sure. > I think it's true that using a TLE for each grouplist item is a waste of > space, and that representing the grouplist as simply a list of expr's > would be good enough. But pulling out the TLE decoration seems like > it's not an appropriate use of time at this stage of the release cycle. > I'd say hold off till after 6.5, then fold it in with the parsetree > redesign that you keep muttering we need (I agree!). Basically, it is my fault that I am bringing up the issue so late. If I had done the Open Items list earlier, we would not be so close to the final. Jan is currently researching it, and we have the regression tests and three weeks. He usually does a fine job of fixing things. Jan, find out what you think is required to get this working, and if it is not too bad, maybe we should go ahead. What does the Aggreg do? Does it has similar duplication? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > All true, but so what? It wastes a few bytes of memory during > planning, I suppose... Not only memory, rewriting takes time too. If the GROUP BY clause has view columns, the rewriter has to change all the expressions twice, one time in the targetlist, another time in the group clause. Wasted efford if they should still be equal after rewriting. > I think it's true that using a TLE for each grouplist item is a waste of > space, and that representing the grouplist as simply a list of expr's > would be good enough. But pulling out the TLE decoration seems like > it's not an appropriate use of time at this stage of the release cycle. > I'd say hold off till after 6.5, then fold it in with the parsetree > redesign that you keep muttering we need (I agree!). I'm sure since I began to look at it that it's not such a good idea to make those changes at this stage. But in fact some of them have to be done to make GROUP BY in views more robust. > > BTW, you keep using the term "RTE" which I'm not familiar with --- > I assume it's just referring to the parse tree nodes? RTE == RangeTableEntry. The relations where the data in the querytree is coming from. The varno in Var nodes is the index into that table so the varno plus the varattno identify one relations attribute. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Bruce Momjian wrote: > The real problem is that it is hard to keep all these items synchronized > as they pass around through the stages. I looked at the aggregates, and > it looks like that has a separate copy too, so it may not be that bad. > We may just be missing a pass that makes appropriate changes in GROUP > clauses, but I am not sure. Exactly the probability of missing some changes somewhere is why I would like to get rid of the field entry in GroupClause. Having to keep duplicate items synced isn't the spirit of a relational database. It's like using triggers to keep two tables in sync where a simple reference and a view would do a better job. > > I think it's true that using a TLE for each grouplist item is a waste of > > space, and that representing the grouplist as simply a list of expr's > > would be good enough. But pulling out the TLE decoration seems like > > it's not an appropriate use of time at this stage of the release cycle. > > I'd say hold off till after 6.5, then fold it in with the parsetree > > redesign that you keep muttering we need (I agree!). > > Basically, it is my fault that I am bringing up the issue so late. If I > had done the Open Items list earlier, we would not be so close to the > final. And my fault to spend too much time playing around with a raytracer. Developers should develop, publishers should publish. > > Jan is currently researching it, and we have the regression tests and > three weeks. He usually does a fine job of fixing things. Thanks for the compliment :-) > Jan, find > out what you think is required to get this working, and if it is not too > bad, maybe we should go ahead. > > What does the Aggreg do? Does it has similar duplication? Not AFAICS. Aggregates can only appear in the targetlist by default. and the nodes below them are the expression to aggregate over. If an aggregate should appear in the WHERE clause it must be placed into a proper subselect (the rule system already tries to do so if an aggregate column is used in the WHERE of a view select, but fails sometimes). I'll go ahead now in little steps. 1. Get rid of the TLE copy in GroupClause. 2. Move the targetlist expansion into the rule system. 3. Rewrite the subquery creation for aggregates in the WHERE clause to take view grouping into account. 4. Allow qualifications against aggregates to be given as "AggCol op Value" by swapping the expression and using the negator operator (if one exists). As you said, we have three weeks. Let's see what one Wieck can do in this time :-) Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
I wrote: > I'll go ahead now in little steps. > > 1. Get rid of the TLE copy in GroupClause. Done. GroupClause now identifies the TLE by a number which is placed into the Resdom by parser/rewriter. New initdb required because of modifications in node print/read functions. > > 2. Move the targetlist expansion into the rule system. Not required anymore AFAICS. Instead I modified the targetlist preprocessing in the planner so that junk attributes used by group clauses get added again to the expanded list. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #