Обсуждение: Re: [GENERAL] Recursive optimization of IN subqueries

Поиск
Список
Период
Сортировка

Re: [GENERAL] Recursive optimization of IN subqueries

От
Dennis Haney
Дата:
Tom Lane wrote:<br /><blockquote cite="mid27924.1074880349@sss.pgh.pa.us" type="cite"><pre wrap="">Dennis Haney <a
class="moz-txt-link-rfc2396E"href="mailto:davh@diku.dk"><davh@diku.dk></a> writes: </pre><blockquote
type="cite"><prewrap="">I saw it as though convert_IN_to_join rewrote the query from   </pre></blockquote><blockquote
type="cite"><prewrap="">select a.* from tenk1 a where a.unique1 in
 
(select c.thousand from tenk1 c where c.hundred = 99);   </pre></blockquote><blockquote type="cite"><pre wrap="">to
</pre></blockquote><blockquotetype="cite"><pre wrap="">select a.* from tenk1 a, tenk1 c where a.unique1 = c.thousand
AND
 
c.hundred = 99;   </pre></blockquote><blockquote type="cite"><pre wrap="">but after looking at it, I've reached the
conclusionthat the rewrite is 
 
to this instead:   </pre></blockquote><blockquote type="cite"><pre wrap="">select a.* from tenk1 a,  (select d.thousand
fromtenk1 d where 
 
d.hundred = 99) as c where a.unique1 = c.thousand;   </pre></blockquote><pre wrap="">
Right.  We do that, and then subsequently pull_up_subqueries transforms
it to the other representation.  The reason for this two-step approach
is that the intermediate form is still a useful improvement if the
subquery can't be pulled up for some reason (e.g., it's got grouping). </pre></blockquote> With improvement I can see
thatit can be materialized and thus used as a normal table in the planner. Is there any additional reasons that I can't
see?<br/> But this limited optimization makes me wonder, why the limitation to optimizing '='?<br /> And why must the
lefthandof the sublink be a variable of the upper query?<br /><br /><br /><blockquote
cite="mid27924.1074880349@sss.pgh.pa.us"type="cite"><pre wrap=""></pre><blockquote type="cite"><pre wrap="">except the
subselectis added as a range table entry instead of a 
 
subselect in the from-list (not that I understand this particular part, 
do you mind explaining?).   </pre></blockquote><pre wrap="">
Same thing.  Every entry in the from-list will have both an RTE and an
entry in the join tree.  This representation is partly historical
(before we had outer joins, there was only the range table and no join
tree at all), but it is convenient for many purposes. </pre></blockquote><br /> Then I don't understand why it gives
twodifferent execution plans? And the Query* is totally different for the two, eg. there is no RTE for the subquery in
thefirst query:<br /><br /><pre>davh=# explain select a.* from test1 a, (select num from test1 where id = 2) as b where
a.num= b.num;                                    QUERY PLAN
 
------------------------------------------------------------------------------------Hash Join  (cost=4.83..29.94
rows=11width=8)  Hash Cond: ("outer".num = "inner".num)  ->  Seq Scan on test1 a  (cost=0.00..20.00 rows=1000
width=8) ->  Hash  (cost=4.82..4.82 rows=2 width=4)        ->  Index Scan using test1_pkey on test1
(cost=0.00..4.82rows=2 width=4)              Index Cond: (id = 2)
 
(6 rows)

davh=# explain select a.* from test1 a where a.num in (select num from test1 where id = 2);
      QUERY PLAN
 
------------------------------------------------------------------------------------Hash IN Join  (cost=4.83..28.75
rows=6width=8)  Hash Cond: ("outer".num = "inner".num)  ->  Seq Scan on test1 a  (cost=0.00..20.00 rows=1000
width=8) ->  Hash  (cost=4.82..4.82 rows=2 width=4)        ->  Index Scan using test1_pkey on test1
(cost=0.00..4.82rows=2 width=4)              Index Cond: (id = 2)
 
(6 rows)

<blockquote type="cite"><pre wrap="">PS: this is a bit off-topic for pgsql-general, please pursue it on
-hackers if you have more questions.</pre></blockquote>ok
</pre><br /><pre class="moz-signature" cols="72">-- 
Dennis
</pre>

Re: [GENERAL] Recursive optimization of IN subqueries

От
Tom Lane
Дата:
Dennis Haney <davh@diku.dk> writes:
> But this limited optimization makes me wonder, why the limitation to 
> optimizing '='?

In the first place, you wouldn't get any improvement anyway if the
combining operator is not '=' --- if it isn't, then merge and hash join
aren't applicable and so you're gonna end up with a nestloop anyhow,
which is no better than what the executor will do with a subselect.

In the second place, what the code is doing is dependent on an understanding
of the semantics of IN; I'm not sure it's applicable to, say,WHERE outervar > ANY (SELECT innervar FROM ...)
and it's definitely not applicable toWHERE outervar > ALL (SELECT innervar FROM ...)
In particular, the optimization paths that involve unique-ifying the
subselect output and then using it as the outer side of a join would
definitely not work for these sorts of things.

> And why must the lefthand of the sublink be a variable of the upper query?

Otherwise the expression isn't a join and I don't think the semantics are
the same as the code is expecting.

> Then I don't understand why it gives two different execution plans?

They look the same to me, other than that a different join rule is
needed (because after all IN is not the same thing as a straight join).
        regards, tom lane


Re: Recursive optimization of IN subqueries

От
"Simon Riggs"
Дата:
> Tom Lane writes
> 
> In the second place, what the code is doing is dependent on an
> understanding
> of the semantics of IN; I'm not sure it's applicable to, say,
>     WHERE outervar > ANY (SELECT innervar FROM ...)
> and it's definitely not applicable to
>     WHERE outervar > ALL (SELECT innervar FROM ...)
> In particular, the optimization paths that involve unique-ifying the
> subselect output and then using it as the outer side of a join would
> definitely not work for these sorts of things.
> 

I'm not sure if I've understood you correctly in the section above. Are
you saying that these types of queries don't have a meaningful or
defined response? Or just that they wouldn't be very well optimized as a
result of the unique-ifying code changes? Or have I just mis-read the
thread...

My understanding is that in ANSI SQL99, the expression expression > ALL (subquery) 

- is TRUE when expression is greater than every value in the set
of values returned by subquery. 
- is TRUE if subquery returns no values.

The expression expression > ANY (subquery) 

- is TRUE when expression is greater than at least one value of
the set of values returned by subquery.
- is FALSE if subsquery returns no values.

(As supported by Oracle 9iv2 and Teradata v2r5.0.)

Best regards, Simon



Re: Recursive optimization of IN subqueries

От
"Simon Riggs"
Дата:
> Tom Lane writes
> 
> In the second place, what the code is doing is dependent on an
> understanding
> of the semantics of IN; I'm not sure it's applicable to, say,
>     WHERE outervar > ANY (SELECT innervar FROM ...)
> and it's definitely not applicable to
>     WHERE outervar > ALL (SELECT innervar FROM ...)
> In particular, the optimization paths that involve unique-ifying the
> subselect output and then using it as the outer side of a join would
> definitely not work for these sorts of things.
> 

I'm not sure if I've understood you correctly in the section above. Are
you saying that these types of queries don't have a meaningful or
defined response? Or just that they wouldn't be very well optimized as a
result of the unique-ifying code changes? Or have I just mis-read the
thread...

My understanding is that in ANSI SQL99, the expression expression > ALL (subquery) 

- is TRUE when expression is greater than every value in the set
of values returned by subquery. 
- is TRUE if subquery returns no values.

The expression expression > ANY (subquery) 

- is TRUE when expression is greater than at least one value of
the set of values returned by subquery.
- is FALSE if subsquery returns no values.

(As supported by Oracle 9iv2 and Teradata v2r5.0.)

Best regards, Simon



Re: Recursive optimization of IN subqueries

От
Dennis Haney
Дата:
Simon Riggs wrote: <blockquote cite="mid002501c3e4df$be91c510$5e00030a@LaptopDellXP" type="cite"><blockquote
type="cite"><prewrap="">Tom Lane writes
 

In the second place, what the code is doing is dependent on an
understanding
of the semantics of IN; I'm not sure it's applicable to, say,WHERE outervar > ANY (SELECT innervar FROM ...)
and it's definitely not applicable toWHERE outervar > ALL (SELECT innervar FROM ...)
In particular, the optimization paths that involve unique-ifying the
subselect output and then using it as the outer side of a join would
definitely not work for these sorts of things.   </pre></blockquote><pre wrap="">
I'm not sure if I've understood you correctly in the section above. Are
you saying that these types of queries don't have a meaningful or
defined response? Or just that they wouldn't be very well optimized as a
result of the unique-ifying code changes? Or have I just mis-read the
thread... </pre></blockquote> I think Tom is refering to the context of the specific optimization.<br /> The
optimizationwe are discussing does nothing to correlated subqueries, and a uncorrolated subquery with > ALL/ANY is
actuallya computed constant and not a join.<br /><br /><pre class="moz-signature" cols="72">-- 
 
Dennis
</pre>

Re: Recursive optimization of IN subqueries

От
"Simon Riggs"
Дата:
<div class="Section1"><p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size:
10.0pt;font-family:Arial;color:navy">My mistake then. Better to check than let a logical hole in… Thanks for letting me
know,Simon</span></font><p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 
10.0pt;font-family:Arial;color:navy"> </span></font><div style="border:none;border-left:solid blue 1.5pt;padding:0cm
0cm0cm 4.0pt"><p class="MsoNormal"><font color="black" face="Tahoma" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:Tahoma;color:windowtext">-----OriginalMessage-----<br /><b><span
style="font-weight:bold">From:</span></b>pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org]
<b><spanstyle="font-weight:bold">On Behalf Of </span></b>Dennis Haney<br /><b><span
style="font-weight:bold">Sent:</span></b>Tuesday, January 27, 2004 14:33<br /><b><span
style="font-weight:bold">To:</span></b>simon@2ndquadrant.com<br /><b><span style="font-weight:bold">Cc:</span></b> 'Tom
Lane';pgsql-hackers@postgresql.org<br /><b><span style="font-weight:bold">Subject:</span></b> Re: [HACKERS] Recursive
optimizationof IN subqueries</span></font><p class="MsoNormal"><font color="black" face="Times New Roman"
size="3"><spanstyle="font-size:12.0pt"> </span></font><p class="MsoNormal"><font color="black" face="Times New Roman"
size="3"><spanstyle="font-size:12.0pt">Simon Riggs wrote: </span></font><blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt"type="cite"><pre wrap=""><font color="black" face="Courier New"
size="2"><spanstyle="font-size:10.0pt">Tom Lane writes</span></font></pre><pre><font color="black" face="Courier New"
size="2"><spanstyle="font-size:10.0pt"> </span></font></pre><pre><font color="black" face="Courier New" size="2"><span
style="font-size:10.0pt">Inthe second place, what the code is doing is dependent on an</span></font></pre><pre><font
color="black"face="Courier New" size="2"><span style="font-size:10.0pt">understanding</span></font></pre><pre><font
color="black"face="Courier New" size="2"><span style="font-size:10.0pt">of the semantics of IN; I'm not sure it's
applicableto, say,</span></font></pre><pre><font color="black" face="Courier New" size="2"><span
style="font-size:10.0pt">WHERE outervar > ANY (SELECT innervar FROM ...)</span></font></pre><pre><font color="black"
face="CourierNew" size="2"><span style="font-size:10.0pt">and it's definitely not applicable
to</span></font></pre><pre><fontcolor="black" face="Courier New" size="2"><span style="font-size:10.0pt"> WHERE
outervar> ALL (SELECT innervar FROM ...)</span></font></pre><pre><font color="black" face="Courier New"
size="2"><spanstyle="font-size:10.0pt">In particular, the optimization paths that involve unique-ifying
the</span></font></pre><pre><fontcolor="black" face="Courier New" size="2"><span style="font-size:10.0pt">subselect
outputand then using it as the outer side of a join would</span></font></pre><pre><font color="black" face="Courier
New"size="2"><span style="font-size:10.0pt">definitely not work for these sorts of
things.</span></font></pre><pre><fontcolor="black" face="Courier New" size="2"><span style="font-size:10.0pt">   
</span></font></pre></blockquote><prewrap=""><font color="black" face="Courier New" size="2"><span
style="font-size:10.0pt"> </span></font></pre><pre><fontcolor="black" face="Courier New" size="2"><span
style="font-size:10.0pt">I'mnot sure if I've understood you correctly in the section above.
Are</span></font></pre><pre><fontcolor="black" face="Courier New" size="2"><span style="font-size:10.0pt">you saying
thatthese types of queries don't have a meaningful or</span></font></pre><pre><font color="black" face="Courier New"
size="2"><spanstyle="font-size:10.0pt">defined response? Or just that they wouldn't be very well optimized as
a</span></font></pre><pre><fontcolor="black" face="Courier New" size="2"><span style="font-size:10.0pt">result of the
unique-ifyingcode changes? Or have I just mis-read the</span></font></pre><pre><font color="black" face="Courier New"
size="2"><spanstyle="font-size:10.0pt">thread...</span></font></pre><pre><font color="black" face="Courier New"
size="2"><spanstyle="font-size:10.0pt">  </span></font></pre><p class="MsoNormal"><font color="black" face="Times New
Roman"size="3"><span style="font-size:12.0pt">I think Tom is refering to the context of the specific optimization.<br
/>The optimization we are discussing does nothing to correlated subqueries, and a uncorrolated subquery with >
ALL/ANYis actually a computed constant and not a join.<br /><br /><br /></span></font><pre><font color="black"
face="CourierNew" size="2"><span style="font-size:10.0pt">-- </span></font></pre><pre><font color="black" face="Courier
New"size="2"><span style="font-size:10.0pt">Dennis</span></font></pre></div></div> 

Re: Recursive optimization of IN subqueries

От
Tom Lane
Дата:
"Simon Riggs" <simon@2ndquadrant.com> writes:
>> Tom Lane writes
>> In particular, the optimization paths that involve unique-ifying the
>> subselect output and then using it as the outer side of a join would
>> definitely not work for these sorts of things.

> I'm not sure if I've understood you correctly in the section above. Are
> you saying that these types of queries don't have a meaningful or
> defined response? Or just that they wouldn't be very well optimized as a
> result of the unique-ifying code changes?

I mean that if the unique-ifying implementation were used, it'd deliver
the wrong answer (too many rows out).  You could possibly carry through
a set of extensions to check which kind of sub-SELECT was in use and not
apply transformations that aren't correct, but it'd be a great deal more
complexity for something of marginal value.  As far as I've seen, people
don't use inequalities in ANY/ALL subselects very much, and so I'm not
excited about complicating the planner to support them better.
        regards, tom lane