Обсуждение: COALESCE implementation question
I noticed that the COALESCE function is implemented as a case statement, with the result that: update t1 set f = Coalesce( (select fn from t2 x where x.f1 = t1.f1), t1.f1) has the following plan: Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=10) SubPlan -> Seq Scan on t2 x (cost=0.00..22.50 rows=10 width=4) -> Seq Scan on t2 x (cost=0.00..22.50 rows=1000 width=4) ie. it *seems* to scan t2 twice, because the resulting CASE statement for the subselect is: case when not (select fn from t2 x where x.f1 = t1.f1) is NULL then (select fn from t2 x where x.f1 = t1.f1)else t1.f1 end which does seem to imply two executions of the same select statement. I realize that the standard says: 2) COALESCE (V(1), V(2)) is equivalent to the following <case specification>: CASE WHEN V(1) IS NOT NULL THEN V(1) ELSE V(2) END 3) "COALESCE (V(1), V(2), . . . , V(n))", for n >= 3, is equivalent to the following <case specification>: CASE WHEN V(1) IS NOT NULL THEN V(1) ELSE COALESCE (V(2), . . . , V(n)) END I was wondering if there was a reason that we interpret this literally, rather than implement a function? Or set a flag on the CaseExpr node to indicate that the 'result == whenClause', or some such. I am still hunting through the planner/optimizer to try to understand if this is feasible, and would appreciate any suggestions... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
P.S. What's worse, I should have mentioned that the plan *with* indexes seems flawed: Create Table t1(f1 int); Create Table t1(f1 int, f2 int); Create Unique Index t2f1 on t2(f1); Create Unique Index t2f2 on t2(f2); explain update t1 set f1 = Coalesce( (select f2 from t2 x where x.f1 = t1.f1), t1.f1); NOTICE: QUERY PLAN: Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=10) SubPlan -> Index Scan using t2f1 on t2 x (cost=0.00..8.14 rows=10width=4) -> Seq Scan on t2 x (cost=0.00..22.50 rows=1000 width=4) ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 17:22 5/08/00 +1000, Philip Warner wrote: >P.S. What's worse, I should have mentioned that the plan *with* indexes >seems flawed: > >Create Table t1(f1 int); >Create Table t1(f1 int, f2 int); >Create Unique Index t2f1 on t2(f1); >Create Unique Index t2f2 on t2(f2); > >explain update t1 set f1 = Coalesce( (select f2 from t2 x where x.f1 = >t1.f1), t1.f1); >NOTICE: QUERY PLAN: > >Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=10) > SubPlan > -> Index Scan using t2f1 on t2 x (cost=0.00..8.14 rows=10 width=4) > -> Seq Scan on t2 x (cost=0.00..22.50 rows=1000 width=4) > Oddly enough, I now think that the EXPLAIN output is a lie; it seems that it never does a sequential scan. So I am now even more confused... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > I realize that the standard says: > 2) COALESCE (V(1), V(2)) is equivalent to the following <case > specification> : > CASE WHEN V(1) IS NOT NULL THEN V(1) ELSE V(2) END > I was wondering if there was a reason that we interpret this literally, > rather than implement a function? Well, the standard is perfectly clear, isn't it? If V(1) has side effects then trying to optimize this into just one evaluation of V(1) will generate non-spec-compliant results. I'd have to agree that two evaluations are pretty annoying, though, and I wonder whether the spec authors *really* meant to demand double evaluation of the "winning" case item. Can anyone check whether Oracle and other DBMSes perform double evaluation? BTW, the "BETWEEN" expression has exactly the same issue. regards, tom lane
Philip Warner <pjw@rhyme.com.au> writes: > explain update t1 set f1 = Coalesce( (select f2 from t2 x where x.f1 = > t1.f1), t1.f1); > NOTICE: QUERY PLAN: > Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=10) > SubPlan > -> Index Scan using t2f1 on t2 x (cost=0.00..8.14 rows=10 width=4) > -> Seq Scan on t2 x (cost=0.00..22.50 rows=1000 width=4) This is a bug caused by interaction between two planning passes run on the same Query node. The parser thinks it's cool to generate a CASE parsetree with multiple paths to the same sub-select Query node, but in fact it is not cool because planning destructively alters the Query node contents. I'm amazed it didn't crash, to tell the truth. I have a patch but haven't applied it yet (been offline for most of two days due to telco idiocy :-(). regards, tom lane
At 22:36 5/08/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> I realize that the standard says: > >> 2) COALESCE (V(1), V(2)) is equivalent to the following <case >> specification> : >> CASE WHEN V(1) IS NOT NULL THEN V(1) ELSE V(2) END > >> I was wondering if there was a reason that we interpret this literally, >> rather than implement a function? > >Well, the standard is perfectly clear, isn't it? If V(1) has side >effects then trying to optimize this into just one evaluation of V(1) >will generate non-spec-compliant results. At least with the new function manager, if I feel te need I can write a 'CoalesceValues' function (at least for fixed numbers of parameters). >I'd have to agree that two evaluations are pretty annoying, though, >and I wonder whether the spec authors *really* meant to demand >double evaluation of the "winning" case item. Can anyone check >whether Oracle and other DBMSes perform double evaluation? It's very hard to believe that is what they meant, or even if they even considered the ramifications of their proposed implementation (I'm not really sure why they chose to describe the implementation and specifically to implement a 'function' as a case statement). eg. the result of the first execution *could* mean that the second execution returns NULL - fine for CASE, lousy for COALESCE. In fact it's pretty easy to write a function that causes COALESCE(f(), 1) to return NULL... Sadly, my usual yard stick (Dec/RDB) seems to evaluate twice (at least that's what it's planner says). And dumping a view with a coalesce statement produces a CASE statement, so it probably has no choice. Just seems daft to me. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 22:27 5/08/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> explain update t1 set f1 = Coalesce( (select f2 from t2 x where x.f1 = >> t1.f1), t1.f1); >> NOTICE: QUERY PLAN: > >> Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=10) >> SubPlan >> -> Index Scan using t2f1 on t2 x (cost=0.00..8.14 rows=10 width=4) >> -> Seq Scan on t2 x (cost=0.00..22.50 rows=1000 width=4) > >This is a bug caused by interaction between two planning passes run >on the same Query node. The parser thinks it's cool to generate a >CASE parsetree with multiple paths to the same sub-select Query node, >but in fact it is not cool because planning destructively alters the >Query node contents. I'm amazed it didn't crash, to tell the truth. > >I have a patch but haven't applied it yet (been offline for most of >two days due to telco idiocy :-(). Thanks for this; I must admit I was very surprised not to get a response withing 24 hours! Is there any chance of sending me the patch - I have been looking at the sources for a while now, and it would be nice to see the answer... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: >> Well, the standard is perfectly clear, isn't it? If V(1) has side >> effects then trying to optimize this into just one evaluation of V(1) >> will generate non-spec-compliant results. > At least with the new function manager, if I feel te need I can write a > 'CoalesceValues' function (at least for fixed numbers of parameters). Mmm ... not really. You could detect nulls all right, but a function- based version of COALESCE would evaluate *all* its arguments exactly once, which is certainly wrong. If you don't stop evaluating with the one you decide to return, you are neither compliant with the spec nor safe (later expressions might yield errors if evaluated!) > Sadly, my usual yard stick (Dec/RDB) seems to evaluate twice (at least > that's what it's planner says). And dumping a view with a coalesce > statement produces a CASE statement, so it probably has no choice. Sounds like they do it the same as we do, ie, expand COALESCE into the specified CASE equivalent on sight. regards, tom lane
At 23:28 5/08/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >>> Well, the standard is perfectly clear, isn't it? If V(1) has side >>> effects then trying to optimize this into just one evaluation of V(1) >>> will generate non-spec-compliant results. > >> At least with the new function manager, if I feel te need I can write a >> 'CoalesceValues' function (at least for fixed numbers of parameters). > >Mmm ... not really. You could detect nulls all right, but a function- >based version of COALESCE would evaluate *all* its arguments exactly >once, which is certainly wrong. Good point. Although in the specific case (two args, one of them constant), it's not an issue. I guess I'll just have to live with double-evaluation, and a COALESCE than can return NULL. Grumble grumble... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: >> This is a bug caused by interaction between two planning passes run >> on the same Query node. The parser thinks it's cool to generate a >> CASE parsetree with multiple paths to the same sub-select Query node, >> but in fact it is not cool because planning destructively alters the >> Query node contents. I'm amazed it didn't crash, to tell the truth. >> >> I have a patch but haven't applied it yet (been offline for most of >> two days due to telco idiocy :-(). > Thanks for this; I must admit I was very surprised not to get a response > withing 24 hours! Is there any chance of sending me the patch - I have been > looking at the sources for a while now, and it would be nice to see the > answer... Well, I'm not *proud* of this patch, it's pretty much brute-force. But it will do until we get around to redesigning querytrees. See src/backend/optimizer/plan/subselect.c. I imagine the diff would apply to 7.0.* if you want to do that. regards, tom lane