Обсуждение: LATERAL

Поиск

Список

Период

Сортировка

LATERAL

От

Robert Haas

Дата:

07 сентября 2009 г., 04:06:30

I've attempted to search the archives for references to the SQL
LATERAL feature, which AIUI is fairly-frequently requested.  Most of
the discussion that I've found harks back to 2003, and that seems to
discuss more the need for eventual support of the feature than exactly
what it is or how to implement it.  Looking around the web a little, I
found these links:

http://farrago.sourceforge.net/design/CollectionTypes.html
http://iablog.sybase.com/paulley/2008/07/cross-and-outer-apply/

Based on reading through this discussion, it appears that LATERAL is
mostly a bit of syntactic sugar that requests that the parser allow
you to reference tables at the same query level.  Assuming that the
necessary executor support were present (which it's currently not),
it's unclear to me why one couldn't simply allow such references
unconditionally.  We currently explicitly block such references in
parse_clause.c, but the comments indicate that this is for reasons of
standards-conformance, not semantic ambiguity.

It appears that we're not completely without the ability to handle
queries of this type.  For example, this works:

select g, generate_series(1,g) as h from generate_series(1,10) g;

But this doesn't:

select g, h from generate_series(1,10) g, generate_series(1,g) h;

Just for kicks, I tried removing the code that throws a syntax error
on the latter query, which resulted in a core dump inside
ExecEvalVar(), execQual.c:546, trying to deference a TupleTableSlot.
I'm guessing that this is because it can't get access to the right
variables - the plan actually looks quite sane:

rhaas=# explain select g, h from generate_series(1,10) g,
generate_series(1,g) h;                                  QUERY PLAN
--------------------------------------------------------------------------------Nested Loop  (cost=0.00..22512.50
rows=1000000width=8)  ->  Function Scan on generate_series g  (cost=0.00..12.50 rows=1000 width=4)  ->  Function Scan
ongenerate_series h  (cost=0.00..12.50 rows=1000 width=4)
 
(3 rows)

The first query just shows up a function scan, which means there's
some projection operation happening here that isn't exposed at the
plan level.  I'm guessing that what needs to happen here is that the
planner needs to be taught that LATERAL queries can only be
implemented as a nested loop (right? unless they're not really using
the LATERAL-ness...) and that the executor needs to be taught how to
make the vars for the outer side of a nestloop visible to the inner
side, when necessary.

Has anyone poked at this at all?

...Robert

Re: LATERAL

От

David Fetter

Дата:

07 сентября 2009 г., 04:45:25

On Sun, Sep 06, 2009 at 11:59:22PM -0400, Robert Haas wrote:
> I've attempted to search the archives for references to the SQL
> LATERAL feature, which AIUI is fairly-frequently requested.
> [snip]
> Has anyone poked at this at all?

I believe Andrew (RhodiumToad) Gierth is taking a look at
implementing the full LATERAL feature from SQL:2008.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: LATERAL

От

Peter Eisentraut

Дата:

07 сентября 2009 г., 07:43:57

On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote:
> Based on reading through this discussion, it appears that LATERAL is
> mostly a bit of syntactic sugar that requests that the parser allow
> you to reference tables at the same query level.  Assuming that the
> necessary executor support were present (which it's currently not),
> it's unclear to me why one couldn't simply allow such references
> unconditionally.

Because joins can be reordered, whereas LATERAL creates a kind of
syntactic sequence point for join reordering.  To pick up your example:

> But this doesn't [work]:
> 
> select g, h from generate_series(1,10) g, generate_series(1,g) h;

You need to constrain the order of the from items in some way so the "g"
refers to something well-defined.  That's what LATERAL does.

You could argue that the parser could infer the references and the
resultant join ordering restrictions automatically, but perhaps it was
deemed that an explicit specification would be less error-prone.

Re: LATERAL

От

Robert Haas

Дата:

07 сентября 2009 г., 23:06:57

On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote:
>> Based on reading through this discussion, it appears that LATERAL is
>> mostly a bit of syntactic sugar that requests that the parser allow
>> you to reference tables at the same query level.  Assuming that the
>> necessary executor support were present (which it's currently not),
>> it's unclear to me why one couldn't simply allow such references
>> unconditionally.
>
> Because joins can be reordered, whereas LATERAL creates a kind of
> syntactic sequence point for join reordering.  To pick up your example:
>
>> But this doesn't [work]:
>>
>> select g, h from generate_series(1,10) g, generate_series(1,g) h;
>
> You need to constrain the order of the from items in some way so the "g"
> refers to something well-defined.  That's what LATERAL does.

I don't think so.  All joins constrain the join order, but none of
them except FULL JOIN constrain it completely, and this is no
exception.  Consider this query:

select * from foo f, generate_series(1, 10) g, generate_series(1, g) h
WHERE f.x = g;

There are two legal join orderings here: f {g h} and {f g} h, just as
there would be for a three-way inner join:

select * from foo f, goo g, hoo h WHERE f.x = g.x and g.x = h.x;

I believe that the join-order restrictions imposed by a LATERAL SRF
are identical to those that would be imposed by an inner join against
a table with join clauses referencing the same tables used in the
inputs to the SRF.  What is different is that there is only one
possible method of implementing the join, namely, for each outer row,
evaluate the SRF.  We can't hash or merge, and we can't swap the inner
and outer rels, but we do still have a choice as to whether to join
against h before or after joining against f.

> You could argue that the parser could infer the references and the
> resultant join ordering restrictions automatically, but perhaps it was
> deemed that an explicit specification would be less error-prone.

Well, the irony is that our current code does already infer the
references - and then disallows them.  It seems to me that if we
implement LATERAL(), we're basically going to be conditionalizing
those checks on whether the relevant subexpression has been wrapped in
LATERAL or not.  Introducing a new keyword would make sense if it were
possible to write a query that is valid without LATERAL(), but means
something different with LATERAL() - but I'm currently unable to
devise such a scenario.  Whether this is for want of creativity or
because none exists I'm not entirely sure.  Any ideas?

...Robert

Re: LATERAL

От

Tom Lane

Дата:

07 сентября 2009 г., 23:48:04

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
>> You could argue that the parser could infer the references and the
>> resultant join ordering restrictions automatically, but perhaps it was
>> deemed that an explicit specification would be less error-prone.

> Well, the irony is that our current code does already infer the
> references - and then disallows them.

Because as often as not, they're mistakes.  Please don't think
you're smarter than the spec here.
        regards, tom lane

Re: LATERAL

От

Robert Haas

Дата:

08 сентября 2009 г., 00:06:19

On Mon, Sep 7, 2009 at 7:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
>>> You could argue that the parser could infer the references and the
>>> resultant join ordering restrictions automatically, but perhaps it was
>>> deemed that an explicit specification would be less error-prone.
>
>> Well, the irony is that our current code does already infer the
>> references - and then disallows them.
>
> Because as often as not, they're mistakes.  Please don't think
> you're smarter than the spec here.

You're frequently the first to criticize the spec, but I have no
interest in second-guessing whatever behavior the spec specifies for
this construct.  I'm just trying to understand it, and as far as I can
tell, LATERAL() is just a piece of syntactic sugar that allows
expressions within to reference FROM items at the same query level.
As far as I can tell, it doesn't change the semantic of any legal
queries - it just renders legal notation that otherwise would be
rejected.  But is my understanding correct?

I haven't got a copy of the spec, so that's a bit of a handicap.  If
someone who does can look this up and comment on how it's supposed to
work, I would certainly appreciate that.  My understanding of it is
currently based on random articles cherry-picked around the Internet
and a handful of emails from archives.postgresql.org, which seems a
little thin.

...Robert

Re: LATERAL

От

Alvaro Herrera

Дата:

08 сентября 2009 г., 00:58:48

Robert Haas escribió:

> I haven't got a copy of the spec, so that's a bit of a handicap.  If
> someone who does can look this up and comment on how it's supposed to
> work, I would certainly appreciate that.  My understanding of it is
> currently based on random articles cherry-picked around the Internet
> and a handful of emails from archives.postgresql.org, which seems a
> little thin.

http://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: LATERAL

От

Stephen Frost

Дата:

08 сентября 2009 г., 01:12:30

Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Sep 7, 2009 at 7:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> > Because as often as not, they're mistakes.  Please don't think
> > you're smarter than the spec here.
>
> You're frequently the first to criticize the spec, but I have no
> interest in second-guessing whatever behavior the spec specifies for
> this construct.

I'm not entirely sure you followed what Tom was getting at here.  If you
did, feel free to ignore me.

> I'm just trying to understand it, and as far as I can
> tell, LATERAL() is just a piece of syntactic sugar that allows
> expressions within to reference FROM items at the same query level.

What I'm gathering is that this may be correct, though I don't know for
sure.  The point I think Tom was making is that even if it *is* just
syntactic sugar, we don't want to allow expressions to reference FROM
items at the same query level *unless* LATERAL is specified.  Your
earlier comments sounded like you would want to implement allowing
expressions to refer to FROM items at the same query level without
LATERAL being specified.

> I haven't got a copy of the spec, so that's a bit of a handicap.  If
> someone who does can look this up and comment on how it's supposed to
> work, I would certainly appreciate that.  My understanding of it is
> currently based on random articles cherry-picked around the Internet
> and a handful of emails from archives.postgresql.org, which seems a
> little thin.

You can get a 'draft' that's very close to the spec pretty easily..
Just do '??sql' in IRC sometime..
Enjoy,
    Stephen

Re: LATERAL

От

Robert Haas

Дата:

08 сентября 2009 г., 01:30:13

On Mon, Sep 7, 2009 at 9:12 PM, Stephen Frost<sfrost@snowman.net> wrote:
> Robert,
>
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Mon, Sep 7, 2009 at 7:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> > Because as often as not, they're mistakes.  Please don't think
>> > you're smarter than the spec here.
>>
>> You're frequently the first to criticize the spec, but I have no
>> interest in second-guessing whatever behavior the spec specifies for
>> this construct.
>
> I'm not entirely sure you followed what Tom was getting at here.  If you
> did, feel free to ignore me.
>
>> I'm just trying to understand it, and as far as I can
>> tell, LATERAL() is just a piece of syntactic sugar that allows
>> expressions within to reference FROM items at the same query level.
>
> What I'm gathering is that this may be correct, though I don't know for
> sure.  The point I think Tom was making is that even if it *is* just
> syntactic sugar, we don't want to allow expressions to reference FROM
> items at the same query level *unless* LATERAL is specified.  Your
> earlier comments sounded like you would want to implement allowing
> expressions to refer to FROM items at the same query level without
> LATERAL being specified.

Fair enough.  I think I started to drift off in the direction of
making that argument, but it wasn't really my point.  The original
point I was trying to make is that we may not need to invent any kind
of new name-resolution or scoping in order to make LATERAL() work.
Instead, LATERAL() can just do everything normally with the exception
of not throwing the following errors when they would otherwise be
thrown:

subquery in FROM cannot refer to other relations of the same query level
function expression in FROM cannot refer to other relations of the
same query level

I'm not sure if I'm right about this, but if I am, it seems likely to
be a pretty straightforward change.

>> I haven't got a copy of the spec, so that's a bit of a handicap.  If
>> someone who does can look this up and comment on how it's supposed to
>> work, I would certainly appreciate that.  My understanding of it is
>> currently based on random articles cherry-picked around the Internet
>> and a handful of emails from archives.postgresql.org, which seems a
>> little thin.
>
> You can get a 'draft' that's very close to the spec pretty easily..
> Just do '??sql' in IRC sometime..

Thanks for the tip.

...Robert

Re: LATERAL

От

Stephen Frost

Дата:

08 сентября 2009 г., 02:09:01

* Robert Haas (robertmhaas@gmail.com) wrote:
> Fair enough.  I think I started to drift off in the direction of
> making that argument, but it wasn't really my point.

To be honest, I'm not sure I agree with Tom here on the value of
requiring a keyword to tell the system that you really mean what you
wrote.  On the other hand, it sounds like the spec is pretty clear on
this, and I don't feel we should violate it just because we think it's
being silly on this point.

> The original
> point I was trying to make is that we may not need to invent any kind
> of new name-resolution or scoping in order to make LATERAL() work.
> Instead, LATERAL() can just do everything normally with the exception
> of not throwing the following errors when they would otherwise be
> thrown:

I don't know for sure, but I do hope you're right.  I'd certainly love
to be able to do this in general..  There's a number of cases where I've
had to do the hokey-pokey to get around our lack of ability to do this
when using set-returning functions.

> I'm not sure if I'm right about this, but if I am, it seems likely to
> be a pretty straightforward change.

Please continue to explore it and propose a patch. :)
Thanks,
    Stephen

Re: LATERAL

От

Robert Haas

Дата:

08 сентября 2009 г., 02:13:03

On Mon, Sep 7, 2009 at 10:08 PM, Stephen Frost<sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> Fair enough.  I think I started to drift off in the direction of
>> making that argument, but it wasn't really my point.
>
> To be honest, I'm not sure I agree with Tom here on the value of
> requiring a keyword to tell the system that you really mean what you
> wrote.  On the other hand, it sounds like the spec is pretty clear on
> this, and I don't feel we should violate it just because we think it's
> being silly on this point.

That's pretty much where I am, too, modulo needing to better
understand the aforesaid spec.

>> The original
>> point I was trying to make is that we may not need to invent any kind
>> of new name-resolution or scoping in order to make LATERAL() work.
>> Instead, LATERAL() can just do everything normally with the exception
>> of not throwing the following errors when they would otherwise be
>> thrown:
>
> I don't know for sure, but I do hope you're right.  I'd certainly love
> to be able to do this in general..  There's a number of cases where I've
> had to do the hokey-pokey to get around our lack of ability to do this
> when using set-returning functions.

Me too.

>> I'm not sure if I'm right about this, but if I am, it seems likely to
>> be a pretty straightforward change.
>
> Please continue to explore it and propose a patch. :)

Yeah, that's the not-so-easy part.  Gotta grok this executor stuff
first before I can even think about implementing this.

...Robert

Re: LATERAL

От

Peter Eisentraut

Дата:

08 сентября 2009 г., 06:45:58

On mån, 2009-09-07 at 19:06 -0400, Robert Haas wrote:
> On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
> > Because joins can be reordered, whereas LATERAL creates a kind of
> > syntactic sequence point for join reordering.  To pick up your example:
> >
> >> But this doesn't [work]:
> >>
> >> select g, h from generate_series(1,10) g, generate_series(1,g) h;
> >
> > You need to constrain the order of the from items in some way so the "g"
> > refers to something well-defined.  That's what LATERAL does.
> 
> I don't think so.  All joins constrain the join order,

I carefully did not say "constrain the join order" but "constraint the
order of the from items" and "a *syntactic* sequence point".  If the
order of the from items is free, then you could also write the above
example as

select g, h from generate_series(1,g) h, generate_series(1,10) g;

but that would no longer be valid with LATERAL in there.

>  but none of
> them except FULL JOIN constrain it completely, and this is no
> exception.

FWIW, the spec says somewhere that LATERAL is not allowed at or near an
outer join.  I'll have to read up on it again.

Re: LATERAL

От

Andrew Gierth

Дата:

08 сентября 2009 г., 22:30:12

>>>>> "David" == David Fetter <david@fetter.org> writes:
>> I've attempted to search the archives for references to the SQL>> LATERAL feature, which AIUI is fairly-frequently
requested.>>[snip]>> Has anyone poked at this at all?

David> I believe Andrew (RhodiumToad) Gierth is taking a look atDavid> implementing the full LATERAL feature from
SQL:2008.

I've looked, but I've not actually had time to try any actual work on
implementation, so anyone else who fancies a go shouldn't hesitate.

Just to pick up on some points from the discussion:

1. LATERAL has to be explicit because it changes the scope of
references.  For example, in:
... (select ... FROM (select a AS b), (select b)) ...
the "b" in the second subselect could be an outer reference, but
it can't be a reference to the first subquery; whereas in:
... (select ... FROM (select a AS b), LATERAL (select b)) ...
the "b" in the second subselect refers to the result of the first
subselect.

2. LATERAL in general constrains both the join order and the join
plan, assuming any lateral references are actually made.

3. LATERAL specifically IS allowed with left outer joins, though the
syntax productions in the spec are sufficiently obscure that this
isn't obvious.  In general there are (as far as I can tell from the
syntax rules) two ways to use it:

SELECT ... FROM foo, LATERAL (bar)

or

SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ...

Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded
(syntax rule 2 for "<joined table>").

4. LATERAL allows some optimizations that aren't currently done, either
by explicitly rewriting the query, or (in theory) the optimizer itself
could consider a lateral plan (I believe Oracle does this). This would
apply to queries of this form:

SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);

which currently forces the t2/t3 join to occur first even where t1 is
small; this could be rewritten with LATERAL as:

SELECT ... FROM t1      LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
wheret2.a=t1.a) s      ON true;

-- 
Andrew (irc:RhodiumToad)

Re: LATERAL

От

Robert Haas

Дата:

09 сентября 2009 г., 00:45:14

On Tue, Sep 8, 2009 at 6:29 PM, Andrew
Gierth<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "David" == David Fetter <david@fetter.org> writes:
>
>  >> I've attempted to search the archives for references to the SQL
>  >> LATERAL feature, which AIUI is fairly-frequently requested.
>  >> [snip]
>  >> Has anyone poked at this at all?
>
>  David> I believe Andrew (RhodiumToad) Gierth is taking a look at
>  David> implementing the full LATERAL feature from SQL:2008.
>
> I've looked, but I've not actually had time to try any actual work on
> implementation, so anyone else who fancies a go shouldn't hesitate.

Thanks for your thoughts - I appreciate it.

> Just to pick up on some points from the discussion:
>
> 1. LATERAL has to be explicit because it changes the scope of
> references.  For example, in:
> ... (select ... FROM (select a AS b), (select b)) ...
> the "b" in the second subselect could be an outer reference, but
> it can't be a reference to the first subquery; whereas in:
> ... (select ... FROM (select a AS b), LATERAL (select b)) ...
> the "b" in the second subselect refers to the result of the first
> subselect.

Can you provide a more complete example?  I'm unable to construct a
working example of this type.  For example:

rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1;
ERROR:  subquery in FROM cannot refer to other relations of same query
level at character 50

Though this works as expected:
rhaas=# select (select 1 from (select a) x, (select b) y) from t1;

> 2. LATERAL in general constrains both the join order and the join
> plan, assuming any lateral references are actually made.

Peter seemed to be saying that LATERAL() must syntactically follow the
same-level FROM items to which it refers.  Is that your understanding
also?

> 3. LATERAL specifically IS allowed with left outer joins, though the
> syntax productions in the spec are sufficiently obscure that this
> isn't obvious.  In general there are (as far as I can tell from the
> syntax rules) two ways to use it:
>
> SELECT ... FROM foo, LATERAL (bar)
>
> or
>
> SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ...
>
> Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded
> (syntax rule 2 for "<joined table>").

Makes sense to me.

> 4. LATERAL allows some optimizations that aren't currently done, either
> by explicitly rewriting the query, or (in theory) the optimizer itself
> could consider a lateral plan (I believe Oracle does this). This would
> apply to queries of this form:
>
> SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);
>
> which currently forces the t2/t3 join to occur first even where t1 is
> small; this could be rewritten with LATERAL as:
>
> SELECT ...
>  FROM t1
>       LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
>                                   where t2.a=t1.a) s
>       ON true;

Well, you haven't actually commuted the joins here - how do you have
in mind for PostgreSQL to execute this?  I'm guessing that it's
something like a nest loop with t1 as the outer side and the lateral
subquery as the inner side, so that the executor repeatedly executes
"select * from t2 join t3 on t2.a = t3.a where t2.a = $1"?

...Robert

Re: LATERAL

От

Andrew Gierth

Дата:

10 сентября 2009 г., 03:25:51

>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:
>> Just to pick up on some points from the discussion:>> >> 1. LATERAL has to be explicit because it changes the scope
of>>references.  For example, in:>> ... (select ... FROM (select a AS b), (select b)) ...>> the "b" in the second
subselectcould be an outer reference, but>> it can't be a reference to the first subquery; whereas in:>> ... (select
...FROM (select a AS b), LATERAL (select b)) ...>> the "b" in the second subselect refers to the result of the first>>
subselect.
Robert> Can you provide a more complete example?  I'm unable toRobert> construct a working example of this type.  For
example:
Robert> rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1;Robert> ERROR:  subquery in FROM cannot
referto other relations of same queryRobert> level at character 50 

That looks like a bug to me. The spec is explicit that the inner definition
of b is not in scope in the second subquery, and therefore that should parse
as an outer reference.
>> 2. LATERAL in general constrains both the join order and the join>> plan, assuming any lateral references are
actuallymade. 
Robert> Peter seemed to be saying that LATERAL() must syntacticallyRobert> follow the same-level FROM items to which it
refers. Is thatRobert> your understanding also? 

LATERAL references must be to items defined in the same FROM clause and
to the left of the LATERAL.

The relevant language of the spec seems to be:
 a) If TR is contained in a <from clause> FC with no intervening    <query expression>, then the scope clause SC of TR
isthe <select    statement: single row> or innermost <query specification> that    contains FC.  The scope of a range
variableof TR is the <select    list>, <where clause>, <group by clause>, <having clause>, and    <window clause> of
SC,together with every <lateral derived table>    that is simply contained in FC and is preceded by TR, and every
<collectionderived table> that is simply contained in FC and is    preceded by TR, and the <join condition> of all
<joinedtable>s    contained in SC that contain TR. If SC is the <query specification>    that is the <query expression
body>of a simple table query STQ,    then the scope of a range variable of TR also includes the <order    by clause> of
STQ.
>> 4. LATERAL allows some optimizations that aren't currently done, either>> by explicitly rewriting the query, or (in
theory)the optimizer itself>> could consider a lateral plan (I believe Oracle does this). This would>> apply to queries
ofthis form:>> >> SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);>> >> which currently forces
thet2/t3 join to occur first even where t1 is>> small; this could be rewritten with LATERAL as:>> >> SELECT ...>>
FROM t1>>        LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)>>                            where
t2.a=t1.a)s>>        ON true; 
Robert> Well, you haven't actually commuted the joins here - how doRobert> you have in mind for PostgreSQL to execute
this? I'mRobert> guessing that it's something like a nest loop with t1 as theRobert> outer side and the lateral
subqueryas the inner side, soRobert> that the executor repeatedly executesRobert> "select * from t2 join t3 on t2.a =
t3.awhere t2.a = $1"? 

Yup.

The current execution plans for this type of query are completely
disastrous if t1 is small (or qualified so as to be small) and t2 and
t3 are large. Having LATERAL would allow the query to be rewritten to
perform reasonably; a bonus would be for the planner to consider the
lateral join automatically without requiring it to be explicitly
requested.

--
Andrew (irc:RhodiumToad)

Re: LATERAL

От

Robert Haas

Дата:

23 сентября 2009 г., 02:53:54

On Wed, Sep 9, 2009 at 11:25 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>  >> 4. LATERAL allows some optimizations that aren't currently done, either
>  >> by explicitly rewriting the query, or (in theory) the optimizer itself
>  >> could consider a lateral plan (I believe Oracle does this). This would
>  >> apply to queries of this form:
>  >>
>  >> SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);
>  >>
>  >> which currently forces the t2/t3 join to occur first even where t1 is
>  >> small; this could be rewritten with LATERAL as:
>  >>
>  >> SELECT ...
>  >>        FROM t1
>  >>        LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
>  >>                            where t2.a=t1.a) s
>  >>        ON true;
>
>  Robert> Well, you haven't actually commuted the joins here - how do
>  Robert> you have in mind for PostgreSQL to execute this?  I'm
>  Robert> guessing that it's something like a nest loop with t1 as the
>  Robert> outer side and the lateral subquery as the inner side, so
>  Robert> that the executor repeatedly executes
>  Robert> "select * from t2 join t3 on t2.a = t3.a where t2.a = $1"?
>
> Yup.
>
> The current execution plans for this type of query are completely
> disastrous if t1 is small (or qualified so as to be small) and t2 and
> t3 are large. Having LATERAL would allow the query to be rewritten to
> perform reasonably; a bonus would be for the planner to consider the
> lateral join automatically without requiring it to be explicitly
> requested.

I've been turning this one over in my head.  It seems to me that this
is very similar to what we already do with inner index-scans, only
generalized to joinrels.  When we consider nested loop plans in
match_unsorted_outer(), we really consider two quite different types
of plans - call them parameterized and unparameterized.  The
unparameterized variants considered regardless of the details of the
inner plan, and basically rescan the same identical inner side once
per outer row, possibly with a materialize node interposed.  The
parameterized variants are what the inner index-scan code is looking
at: they also involve rescanning the inner path, but not the same set
of tuples every time.   Instead, we look for cases where an index is
available that can support a lookup of the specific values that the
outer side needs on each pass as an index condition.

Currently, however, we only consider this possibility when the inner
rel is NOT a joinrel.  It seems like it might be possible to change
this, but it doesn't look straightforward.  When the inner rel is a
baserel, there's really nothing to do except grovel through the
indexes looking for something useful, and then estimate the cost of
using it.  When the inner rel is a joinrel, it's a lot more
complicated.  A parameterized nested loop might be right even if only
some of the rels making up the joinrel are indexed (imagine t2 IJ t3
IJ t4 where t2 and t3 are large and indexed and t4 is small and
unindexed).  In addition, the relations making up the inner rel could
be joined in any order and using a variety of strategies.  The path
that is best when trying to suck down the ENTIRE inner rel doesn't
necessarily bear any resemblence to the plan that is best when trying
to suck down only a part of it.

To some extent, this seems like a tangent as far as LATERAL is
concerned: I think the primary reason why people want LATERAL
(certainly, why I want it) is for SRFs, for which there isn't any
choice of execution plan anyway.  But it's certainly an interesting
optimization problem, and I wish I had some better ideas on how to
tackle it.

...Robert

Re: LATERAL

От

Tom Lane

Дата:

23 сентября 2009 г., 03:22:09

Robert Haas <robertmhaas@gmail.com> writes:
> I've been turning this one over in my head.  It seems to me that this
> is very similar to what we already do with inner index-scans, only
> generalized to joinrels.

Right.

> Currently, however, we only consider this possibility when the inner
> rel is NOT a joinrel.  It seems like it might be possible to change
> this, but it doesn't look straightforward.

Well, it's straightforward enough in theory, it's just the exponential
growth in the number of join paths to consider that's a problem :-(.
I think what we'd need is a heuristic to limit the paths considered.

I think Andrew was suggesting that we not attempt to consider this
automatically, but only when the user writes the query in a way that
exposes it directly via LATERAL.  While that goes against my normal
instincts for the planner, it isn't unreasonable as a first step.
        regards, tom lane

Re: LATERAL

От

Robert Haas

Дата:

18 октября 2009 г., 00:58:55

On Tue, Sep 22, 2009 at 11:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Currently, however, we only consider this possibility when the inner
>> rel is NOT a joinrel.  It seems like it might be possible to change
>> this, but it doesn't look straightforward.
>
> Well, it's straightforward enough in theory, it's just the exponential
> growth in the number of join paths to consider that's a problem :-(.
> I think what we'd need is a heuristic to limit the paths considered.

[ picking this up again now that CommitFest is over ]

So that I don't have to keep referring to "this possibility" and
similar circumlocution, let's define an index-accelerated path (feel
free to suggest a different term) to be the generalization to joinrels
of what we now call a nested inner index-scan.  In other words, they
can only be usefully used on the inner side of a nested loop, where
they'll be rescanned with different parameters for each outer row, and
they can only ever win when some part of the path involves a *partial*
index scan that can used one of the passed parameters as an index
qual.

For a fixed a set of parameters to be pushed down from the outer side,
it seems to me that we could build up a set of index-accelerated paths
for a joinrel {A B} by considering the combination of (1) an
index-accelerated path for A joined to an index-accelerated path for
B, or (2) an index-accelerated path for A joined to the cheapest total
path for B.  I believe we don't need to worry about path keys because
this will only ever be used on the inner side of a nested loop, so the
pathkeys will be ignored anyway.  In other words, the first heuristic
is that you can't build up an index-accelerated path for the joinrel
unless there is an index-accelerated path for a least one of its
baserels.

That still leaves a lot of silly paths, though.  In many cases, if
you're thinking about joining A to {B C} using an index-accelerated
path, you'd be just as well off joining to B first and then to C.  So
it might be that we only need to consider index-accelerated paths when
there is no legal join between the LHS and a subset of the RHS.  But
I'm not completely sure that I'm right about this, nor whether it's
easy to check for.

The other problem I see here is that the bottom-up approach that we
use in general is going to be difficult to apply here, because the set
of paths will vary depending on what parameters are pushed down from
the outer side.

> I think Andrew was suggesting that we not attempt to consider this
> automatically, but only when the user writes the query in a way that
> exposes it directly via LATERAL.  While that goes against my normal
> instincts for the planner, it isn't unreasonable as a first step.

Possibly, but I'm not sure we have a good enough design sketch at this
point to even know whether such a restriction would be useful.

Another thought, only semi-related.  One of the big use cases for
LATERAL in general is to use a set-returning function in the FROM
clause that uses vars from a preceding FROM item.  I am idly wondering
if there's a reason why ExecProject is not its own node type.  It
seems that we have close to the same logic scattered through a whole
bunch of different node types...

...Robert

Re: LATERAL

От

Tom Lane

Дата:

18 октября 2009 г., 01:08:42

Robert Haas <robertmhaas@gmail.com> writes:
> Another thought, only semi-related.  One of the big use cases for
> LATERAL in general is to use a set-returning function in the FROM
> clause that uses vars from a preceding FROM item.  I am idly wondering
> if there's a reason why ExecProject is not its own node type.

You generally need it everywhere.  Scan nodes need it because you don't
want unused columns propagating upwards, and join nodes need it because
the two input tuples have to be unified somehow.  We do skip projection
ability in a few node types, but I doubt it would be profitable to
remove it from the rest.
        regards, tom lane

Re: LATERAL

От

Tom Lane

Дата:

18 октября 2009 г., 03:45:28

Robert Haas <robertmhaas@gmail.com> writes:
> That still leaves a lot of silly paths, though.  In many cases, if
> you're thinking about joining A to {B C} using an index-accelerated
> path, you'd be just as well off joining to B first and then to C.  So
> it might be that we only need to consider index-accelerated paths when
> there is no legal join between the LHS and a subset of the RHS.

Yeah.  If there are no join order constraints, it's always possible to
form a plan such that you join a given rel only once you have all the
rels needed (to provide values for the inner indexscan) on the lefthand
side.  I think this is probably the main reason why the issue was not
treated in the planner originally.  So maybe the way to think about
this is as a way of dealing with join order constraints without losing
the benefits of inner indexscans.

> The other problem I see here is that the bottom-up approach that we
> use in general is going to be difficult to apply here, because the set
> of paths will vary depending on what parameters are pushed down from
> the outer side.

Well, we deal with that already --- the set of possible inner indexscan
paths already varies depending on what the LHS is.  I think the point
here is that we'd be considering an inner path that's against an LHS
that it's not legal to join the inner rel to *directly*.  Such a path
would only be legal if we later join to that LHS at a higher join level.
So we'd be keeping around some tentative paths that might not ever form
a valid join plan.

Maybe we should turn around the way that inner indexscan paths are
made.  Currently we form them on-the-fly while considering a valid
join combination.  Maybe we should build them all at the first level
(driving this off the set of available join clauses for each base rel)
and mark each such path as "requires a join to this other set of rels
to be valid".  But then we'd go ahead and join such paths to *other*
rels, keeping the resulting join paths still marked as requiring the
same future join.  Once that join actually happens, the resulting path
becomes fully valid.  Only a join to a proper subset of the future-join
requirement would be disallowed meanwhile.

I'm not even sure this would be slower or more complicated than what we
do now --- if you look at the logic that caches potential inner
indexscan plans, it's almost doing this already.  It would result in
considering more join paths, but only ones that have some plausible use.
        regards, tom lane

Re: LATERAL

От

Robert Haas

Дата:

18 октября 2009 г., 11:02:10

On Sat, Oct 17, 2009 at 10:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> That still leaves a lot of silly paths, though.  In many cases, if
>> you're thinking about joining A to {B C} using an index-accelerated
>> path, you'd be just as well off joining to B first and then to C.  So
>> it might be that we only need to consider index-accelerated paths when
>> there is no legal join between the LHS and a subset of the RHS.
>
> Yeah.  If there are no join order constraints, it's always possible to
> form a plan such that you join a given rel only once you have all the
> rels needed (to provide values for the inner indexscan) on the lefthand
> side.  I think this is probably the main reason why the issue was not
> treated in the planner originally.  So maybe the way to think about
> this is as a way of dealing with join order constraints without losing
> the benefits of inner indexscans.
>
>> The other problem I see here is that the bottom-up approach that we
>> use in general is going to be difficult to apply here, because the set
>> of paths will vary depending on what parameters are pushed down from
>> the outer side.
>
> Well, we deal with that already --- the set of possible inner indexscan
> paths already varies depending on what the LHS is.  I think the point
> here is that we'd be considering an inner path that's against an LHS
> that it's not legal to join the inner rel to *directly*.  Such a path
> would only be legal if we later join to that LHS at a higher join level.
> So we'd be keeping around some tentative paths that might not ever form
> a valid join plan.
>
> Maybe we should turn around the way that inner indexscan paths are
> made.  Currently we form them on-the-fly while considering a valid
> join combination.  Maybe we should build them all at the first level
> (driving this off the set of available join clauses for each base rel)
> and mark each such path as "requires a join to this other set of rels
> to be valid".  But then we'd go ahead and join such paths to *other*
> rels, keeping the resulting join paths still marked as requiring the
> same future join.  Once that join actually happens, the resulting path
> becomes fully valid.  Only a join to a proper subset of the future-join
> requirement would be disallowed meanwhile.
>
> I'm not even sure this would be slower or more complicated than what we
> do now --- if you look at the logic that caches potential inner
> indexscan plans, it's almost doing this already.  It would result in
> considering more join paths, but only ones that have some plausible use.

Wow, that's pretty sneaky.  I had to read this email twice before I
understood what you were proposing.  It sounds to me like it will
work.  I'm not 100% sure what the impact on planner performance will
be, but it's at least plausible that it will be OK.

I think you should only ever join an incomplete inner-indexscan path
to (1) another inner-indexscan path or (2) the cheapest total path for
*exactly* the future-join requirement.  Anything else doesn't seem
productive.

...Robert

Re: LATERAL

От

Tom Lane

Дата:

18 октября 2009 г., 15:30:29

Robert Haas <robertmhaas@gmail.com> writes:
> I think you should only ever join an incomplete inner-indexscan path
> to (1) another inner-indexscan path or (2) the cheapest total path for
> *exactly* the future-join requirement.  Anything else doesn't seem
> productive.

I don't see any reason to constrain the form of joins before the
future-join requirement.  Once you get to the join that satisfies
that requirement, obviously you must implement it as a nestloop
with the inner-indexscan path on the inside.  But there shouldn't
be any more constraint beyond that one for that join, either:* you might want a fast-start plan not cheapest total*
sortorderings of the outer path might be useful

We know that sort ordering of the inner indexscan or its joins
won't be useful once we reach the future-join level, so it might
be possible to discard paths that are not cheapest total cost
below that level.  The problem is to distinguish sort orderings
that are useful within joins below that level.  You could avoid that
if you could convince yourself that there's no point in ever doing
a mergejoin at such a level, but I don't think I believe that.

It may well be that there's no point in being too tense about this issue
anyway.  The planner will only consider early joins to rels that have
join clauses to the rel with the inner-indexscan path.  There probably
won't be that many of them.
        regards, tom lane

Re: LATERAL

От

Robert Haas

Дата:

18 октября 2009 г., 19:37:00

On Sun, Oct 18, 2009 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I think you should only ever join an incomplete inner-indexscan path
>> to (1) another inner-indexscan path or (2) the cheapest total path for
>> *exactly* the future-join requirement.  Anything else doesn't seem
>> productive.
>
> I don't see any reason to constrain the form of joins before the
> future-join requirement.  Once you get to the join that satisfies
> that requirement, obviously you must implement it as a nestloop
> with the inner-indexscan path on the inside.  But there shouldn't
> be any more constraint beyond that one for that join, either:
>        * you might want a fast-start plan not cheapest total
>        * sort orderings of the outer path might be useful
>
> We know that sort ordering of the inner indexscan or its joins
> won't be useful once we reach the future-join level, so it might
> be possible to discard paths that are not cheapest total cost
> below that level.

Yeah, this was what I was driving at, but I got turned around in my
head and was explaining it incorrectly.

> The problem is to distinguish sort orderings
> that are useful within joins below that level.  You could avoid that
> if you could convince yourself that there's no point in ever doing
> a mergejoin at such a level, but I don't think I believe that.
>
> It may well be that there's no point in being too tense about this issue
> anyway.  The planner will only consider early joins to rels that have
> join clauses to the rel with the inner-indexscan path.  There probably
> won't be that many of them.

You could probably convince me that a merge join is not going to be
too useful (how often can you want a merge join on the inner side of a
nested loop?  ... especially when there are partial index scans
involved) but I think your last point here is well-taken.  We've
certainly turned a corner from "it's just the exponential growth in
the number of join paths to consider that's a problem".  :-)

...Robert

Re: LATERAL

От

Tom Lane

Дата:

18 октября 2009 г., 19:58:03

Robert Haas <robertmhaas@gmail.com> writes:
> You could probably convince me that a merge join is not going to be
> too useful (how often can you want a merge join on the inner side of a
> nested loop?

Why not?  As Andrew pointed out, what we're really trying to accomplish
here is consider sub-join plans that are parameterized by a value
obtained from an outer relation.  I think we shouldn't artificially
limit what we consider.

But anyway I think we're on the same page here: what we ought to do is
try implementing this scheme without any extra restrictions on what it
considers, and see what the performance is like.  We can try to limit
what it considers if it turns out not to work well in the simplest
form.
        regards, tom lane

Re: LATERAL

От

Robert Haas

Дата:

18 октября 2009 г., 20:05:15

On Sun, Oct 18, 2009 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> You could probably convince me that a merge join is not going to be
>> too useful (how often can you want a merge join on the inner side of a
>> nested loop?
>
> Why not?  As Andrew pointed out, what we're really trying to accomplish
> here is consider sub-join plans that are parameterized by a value
> obtained from an outer relation.  I think we shouldn't artificially
> limit what we consider.
>
> But anyway I think we're on the same page here: what we ought to do is
> try implementing this scheme without any extra restrictions on what it
> considers, and see what the performance is like.  We can try to limit
> what it considers if it turns out not to work well in the simplest
> form.

And then when we get done we can consider implementing $SUBJECT, which
is no longer what this thread is about at all.  :-)

...Robert

Re: LATERAL

От

Greg Stark

Дата:

19 октября 2009 г., 20:40:18

On Sun, Oct 18, 2009 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> You could probably convince me that a merge join is not going to be
>> too useful (how often can you want a merge join on the inner side of a
>> nested loop?
>
> Why not?  As Andrew pointed out, what we're really trying to accomplish
> here is consider sub-join plans that are parameterized by a value
> obtained from an outer relation.  I think we shouldn't artificially
> limit what we consider.

Am I understanding you right that a typical case of this might be something like

nested loop   index scan expecting 1 record   merge join       index scan on partial index where col = outer.foo and
col2
between a and b       some other scan

or

nested loop   index scan expecting 1 record   merge join       index scan on <col1,col2> where col1 = outer.foo and
col2
between a and b       some other scan

Ie, where the nested loop is a degenerate nested loop which only
expects a single value and provides a parameter which allows some
partial index to work or allows for some other index scan by providing
a higher order key element?

--
greg

Re: LATERAL

От

Tom Lane

Дата:

19 октября 2009 г., 22:26:37

Greg Stark <gsstark@mit.edu> writes:
> nested loop
>     index scan expecting 1 record
>     merge join
>         index scan on <col1,col2> where col1 = outer.foo and col2
> between a and b
>         some other scan

> Ie, where the nested loop is a degenerate nested loop which only
> expects a single value and provides a parameter which allows some
> partial index to work or allows for some other index scan by providing
> a higher order key element?

Right.  I don't see any particular reason to assume the inner path
is iterated only once, either.  If the key value coming from the outer
path is sufficiently useful, this could be a win even with multiple
iterations, as compared to having to scan the whole of some large
relation or other ...
        regards, tom lane

Re: LATERAL

От

Andrew Gierth

Дата:

19 октября 2009 г., 22:38:59

>>>>> "Greg" == Greg Stark <gsstark@mit.edu> writes:
>> Why not?  As Andrew pointed out, what we're really trying to>> accomplish here is consider sub-join plans that are
parameterized>>by a value obtained from an outer relation.  I think we shouldn't>> artificially limit what we consider. 
Greg> Am I understanding you right that a typical case of this mightGreg> be something like
Greg> nested loopGreg>     index scan expecting 1 recordGreg>     merge joinGreg>         index scan on partial index
wherecol = outer.foo and col2Greg> between a and bGreg>         some other scan 

no, because you could never pick the partial index at plan time.
Greg> or
Greg> nested loopGreg>     index scan expecting 1 recordGreg>     merge joinGreg>         index scan on <col1,col2>
wherecol1 = outer.foo and col2Greg> between a and bGreg>         some other scan 
Greg> Ie, where the nested loop is a degenerate nested loop whichGreg> only expects a single value and provides a
parameterwhichGreg> allows some partial index to work or allows for some otherGreg> index scan by providing a higher
orderkey element? 

The nested loop does NOT have to be degenerate. Consider queries of
this form:

select * from small             left join (big1 join big2 on (big1.id=big2.id))             on (small.id=big1.id);

Right now, the only way pg can plan this is to do a hashjoin or
mergejoin of the _entire content of big1 and big2_ and join the
result against "small" (again in a hashjoin or mergejoin plan).
This becomes excessively slow compared to the "ideal" plan:
 nested loop     seqscan on small     nested loop        indexscan on big1 where id=small.id        indexscan on big2
whereid=small.id (or big1.id which is equiv) 

(The same argument applies if "small" is not actually small but has
restriction clauses)

--
Andrew (irc:RhodiumToad)

Re: LATERAL

От

Robert Haas

Дата:

18 декабря 2009 г., 02:13:44

On Sun, Oct 18, 2009 at 2:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> You could probably convince me that a merge join is not going to be
>> too useful (how often can you want a merge join on the inner side of a
>> nested loop?
>
> Why not?  As Andrew pointed out, what we're really trying to accomplish
> here is consider sub-join plans that are parameterized by a value
> obtained from an outer relation.  I think we shouldn't artificially
> limit what we consider.
>
> But anyway I think we're on the same page here: what we ought to do is
> try implementing this scheme without any extra restrictions on what it
> considers, and see what the performance is like.  We can try to limit
> what it considers if it turns out not to work well in the simplest
> form.

You mention what "we" ought to do here, so - is this something that
you're planning to have a go at?

Another question I have - while generalizing the inner-indexscan
machinery is an interesting join optimization technique, I'm thinking
that it actually has very little to do with LATERAL.  Is there any
reason to suppose that one or the other needs to be done first?

...Robert

Re: LATERAL

От

Hitoshi Harada

Дата:

19 декабря 2009 г., 16:49:39

2009/10/20 Andrew Gierth <andrew@tao11.riddles.org.uk>:
> Right now, the only way pg can plan this is to do a hashjoin or
> mergejoin of the _entire content of big1 and big2_ and join the
> result against "small" (again in a hashjoin or mergejoin plan).
> This becomes excessively slow compared to the "ideal" plan:
>
>  nested loop
>      seqscan on small
>      nested loop
>         indexscan on big1 where id=small.id
>         indexscan on big2 where id=small.id (or big1.id which is equiv)
>
> (The same argument applies if "small" is not actually small but has
> restriction clauses)

I have a similar issue on my mind, but is this the same as the topic?

SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY
large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla)

The ideal plan is SeqScan on small with filtering sub query aggregate
on large by small.id but the actual plan is full aggregate on large
since the planner doesn't push down outer qual to aggregate node. The
output will discard almost all of agged's output.


Regards,

--
Hitoshi Harada

Re: LATERAL

От

Robert Haas

Дата:

19 декабря 2009 г., 17:11:49

On Sat, Dec 19, 2009 at 12:49 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
> 2009/10/20 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>> Right now, the only way pg can plan this is to do a hashjoin or
>> mergejoin of the _entire content of big1 and big2_ and join the
>> result against "small" (again in a hashjoin or mergejoin plan).
>> This becomes excessively slow compared to the "ideal" plan:
>>
>>  nested loop
>>      seqscan on small
>>      nested loop
>>         indexscan on big1 where id=small.id
>>         indexscan on big2 where id=small.id (or big1.id which is equiv)
>>
>> (The same argument applies if "small" is not actually small but has
>> restriction clauses)
>
> I have a similar issue on my mind, but is this the same as the topic?
>
> SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY
> large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla)
>
> The ideal plan is SeqScan on small with filtering sub query aggregate
> on large by small.id but the actual plan is full aggregate on large
> since the planner doesn't push down outer qual to aggregate node. The
> output will discard almost all of agged's output.

I just tried this and it works for me.

create table foo (id serial, name varchar, primary key (id));
create table bar (id serial, foo_id integer references foo (id), name
varchar, primary key (id));
insert into foo (name) select random()::varchar from generate_series(1,1000);
insert into bar (foo_id, name) select (g%10)+1, random()::varchar from
generate_series(1,10000) g;
explain select * from foo inner join (select foo_id, sum(1) from bar
group by 1) x on foo.id = x.foo_id where x.foo_id = 1;

...Robert

Re: LATERAL

От

Hitoshi Harada

Дата:

19 декабря 2009 г., 17:41:01

2009/12/20 Robert Haas <robertmhaas@gmail.com>:
> On Sat, Dec 19, 2009 at 12:49 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
>> 2009/10/20 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>> Right now, the only way pg can plan this is to do a hashjoin or
>>> mergejoin of the _entire content of big1 and big2_ and join the
>>> result against "small" (again in a hashjoin or mergejoin plan).
>>> This becomes excessively slow compared to the "ideal" plan:
>>>
>>>  nested loop
>>>      seqscan on small
>>>      nested loop
>>>         indexscan on big1 where id=small.id
>>>         indexscan on big2 where id=small.id (or big1.id which is equiv)
>>>
>>> (The same argument applies if "small" is not actually small but has
>>> restriction clauses)
>>
>> I have a similar issue on my mind, but is this the same as the topic?
>>
>> SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY
>> large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla)
>>
>> The ideal plan is SeqScan on small with filtering sub query aggregate
>> on large by small.id but the actual plan is full aggregate on large
>> since the planner doesn't push down outer qual to aggregate node. The
>> output will discard almost all of agged's output.
>
> I just tried this and it works for me.
>
> create table foo (id serial, name varchar, primary key (id));
> create table bar (id serial, foo_id integer references foo (id), name
> varchar, primary key (id));
> insert into foo (name) select random()::varchar from generate_series(1,1000);
> insert into bar (foo_id, name) select (g%10)+1, random()::varchar from
> generate_series(1,10000) g;
> explain select * from foo inner join (select foo_id, sum(1) from bar
> group by 1) x on foo.id = x.foo_id where x.foo_id = 1;
>
> ...Robert
>

Ah your example works for me, too. My issue is:

explain select * from foo inner join (select foo_id, sum(1) from bar
group by 1) x on foo.id = x.foo_id where foo.id = 1;

where foo.id = 1 (not where x.foo_id = 1).
And I now figured out it's another problem.

Regards,


--
Hitoshi Harada

Re: LATERAL

От

Robert Haas

Дата:

19 декабря 2009 г., 17:43:32

On Thu, Dec 17, 2009 at 10:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Another question I have - while generalizing the inner-indexscan
> machinery is an interesting join optimization technique, I'm thinking
> that it actually has very little to do with LATERAL.  Is there any
> reason to suppose that one or the other needs to be done first?

And the winner is... yes.  Or at least, I think so.  One of the major
reasons why people want LATERAL() is for SRFs, but currently, even if
you beat the code into allowing a SRF with an outer reference, the
planner can easily be persuaded to run the SRF on the outer side of a
join with the dependency as the inner side, which ain't gonna work.
(Even you jigger the query so that the planner gets them on the
correct sides of the join, the executor fails, but that's a different
problem.)

The idea Tom came up with back in October is to allow paths to be
tagged with a set of rels to which they must in the future be joined
in order for the path to be allowable.  The point of that exercise was
to generalize the current inner-indexscan machinery so that we can
create that type of plan in match_unsorted_outer() even when the inner
side is a joinrel.  But, it strikes me that what we need to allow a
function scan with an outer reference is remarkably similar - the
function scan can only be used as the inner side of a nestloop with a
certain set of rels on the outer side.

On the other hand, it's not exactly the same, either.  In the case of
a construct like A LJ (B IJ C), partial-index scan paths for B and C
will require a subsequent nest-join to A to become fully valid, but
there will also be other paths that don't.  But for something like "A,
LATERAL (some_srf(A.x))", the ONLY path for the "rel" defined by
some_srf(A.x) has a future-join requirement of {A}.  It's not clear to
me whether there's anything useful that can be done with this
knowledge.

Incidentally, the reason why the executor chokes trying to execute a
SRF with an outer reference is because ExecEvalVar() craps out trying
to dereference a null TupleTableSlot.  If I'm understanding this
correctly, that, in turn, happens because the variable that we're
trying to deference is marked as neither INNER nor OUTER, so it's
assumed to be from a scan, but there's no scan node.  Going even
further from my area of actually understanding what's going on, I
think this needs to be fixed by adjusting setrefs.c.  Allowing
LATERAL(), or for that matter the generalized inner-index scan stuff,
will I think mean that set_inner_join_references() will need to handle
a lot more cases than it current does.  I don't understand this code
well enough to begin to speculate as to what should happen here.

...Robert

Re: LATERAL

От

Tom Lane

Дата:

19 декабря 2009 г., 19:03:14

Robert Haas <robertmhaas@gmail.com> writes:
> Incidentally, the reason why the executor chokes trying to execute a
> SRF with an outer reference is because ExecEvalVar() craps out trying
> to dereference a null TupleTableSlot.  If I'm understanding this
> correctly, that, in turn, happens because the variable that we're
> trying to deference is marked as neither INNER nor OUTER, so it's
> assumed to be from a scan, but there's no scan node.  Going even
> further from my area of actually understanding what's going on, I
> think this needs to be fixed by adjusting setrefs.c.

Well, no: we can't handle such references as OUTER vars because the
OUTER slot is likely to be in use already in the sub-join.  It would be
even messier if you wanted several references to different outer
relations.

I believe the correct approach is probably to treat values that need to
be propagated into the inner side as executor parameters.  This could
replace the existing, rather crocky, management of values passed into a
nestloop inner indexscan.  The mechanisms that deal with forcing rescans
of subplans affected by a changed parameter value would be very helpful
here too.
        regards, tom lane

Re: LATERAL

От

Robert Haas

Дата:

19 декабря 2009 г., 20:43:56

On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Incidentally, the reason why the executor chokes trying to execute a
>> SRF with an outer reference is because ExecEvalVar() craps out trying
>> to dereference a null TupleTableSlot.  If I'm understanding this
>> correctly, that, in turn, happens because the variable that we're
>> trying to deference is marked as neither INNER nor OUTER, so it's
>> assumed to be from a scan, but there's no scan node.  Going even
>> further from my area of actually understanding what's going on, I
>> think this needs to be fixed by adjusting setrefs.c.
>
> Well, no: we can't handle such references as OUTER vars because the
> OUTER slot is likely to be in use already in the sub-join.  It would be
> even messier if you wanted several references to different outer
> relations.

Oh.  Yeah.

> I believe the correct approach is probably to treat values that need to
> be propagated into the inner side as executor parameters.  This could
> replace the existing, rather crocky, management of values passed into a
> nestloop inner indexscan.  The mechanisms that deal with forcing rescans
> of subplans affected by a changed parameter value would be very helpful
> here too.

What is the best place to look for the existing, rather crocky code?
I have to admit that the whole mechanism by which paths get
transformed into plans and handed off to the executor is still rather
opaque to me.

...Robert

Re: LATERAL

От

Tom Lane

Дата:

19 декабря 2009 г., 20:47:57

Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I believe the correct approach is probably to treat values that need to
>> be propagated into the inner side as executor parameters. �This could
>> replace the existing, rather crocky, management of values passed into a
>> nestloop inner indexscan.

> What is the best place to look for the existing, rather crocky code?

Follow the second argument of ExecReScan from nodeNestloop to
nodeIndexscan.
        regards, tom lane

Re: LATERAL

От

Robert Haas

Дата:

20 декабря 2009 г., 03:01:49

On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I believe the correct approach is probably to treat values that need to
>>> be propagated into the inner side as executor parameters.  This could
>>> replace the existing, rather crocky, management of values passed into a
>>> nestloop inner indexscan.
>
>> What is the best place to look for the existing, rather crocky code?
>
> Follow the second argument of ExecReScan from nodeNestloop to
> nodeIndexscan.

Yeah, this is grotty.  It appears that the comment introducing
ExecReScan() is somewhat incorrect.  It asserts that exprCtxt is used
only

Re: LATERAL

От

Robert Haas

Дата:

20 декабря 2009 г., 03:06:49

On Sat, Dec 19, 2009 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> I believe the correct approach is probably to treat values that need to
>>>> be propagated into the inner side as executor parameters.  This could
>>>> replace the existing, rather crocky, management of values passed into a
>>>> nestloop inner indexscan.
>>
>>> What is the best place to look for the existing, rather crocky code?
>>
>> Follow the second argument of ExecReScan from nodeNestloop to
>> nodeIndexscan.
>
> Yeah, this is grotty.  It appears that the comment introducing
> ExecReScan() is somewhat incorrect.  It asserts that exprCtxt is used
> only

Sigh.

...is used only for index scans.  However, it's actually also used for
bitmap scans (both heap and index) and TID scans.  Also, there appears
to be an effort by nodes that don't use exprCtxt directly to propagate
down through the node tree, which doesn't seem to make much sense if
this is only intended to be used on the inner side of a nestloop.

...Robert

Re: LATERAL

От

Tom Lane

Дата:

20 декабря 2009 г., 04:46:28

Robert Haas <robertmhaas@gmail.com> writes:
>> Yeah, this is grotty. �It appears that the comment introducing
>> ExecReScan() is somewhat incorrect. �It asserts that exprCtxt is used
>> only

> Sigh.

> ...is used only for index scans.  However, it's actually also used for
> bitmap scans (both heap and index) and TID scans.

Yeah, the comment was probably correct when written.

> Also, there appears
> to be an effort by nodes that don't use exprCtxt directly to propagate
> down through the node tree, which doesn't seem to make much sense if
> this is only intended to be used on the inner side of a nestloop.

That's just dead code, which is a good thing because the coverage is
pretty incomplete.
        regards, tom lane

Re: LATERAL

От

Bruce Momjian

Дата:

19 февраля 2010 г., 02:35:15

Robert Haas wrote:
> On Sat, Dec 19, 2009 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Robert Haas <robertmhaas@gmail.com> writes:
> >>> On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>>> I believe the correct approach is probably to treat values that need to
> >>>> be propagated into the inner side as executor parameters. ?This could
> >>>> replace the existing, rather crocky, management of values passed into a
> >>>> nestloop inner indexscan.
> >>
> >>> What is the best place to look for the existing, rather crocky code?
> >>
> >> Follow the second argument of ExecReScan from nodeNestloop to
> >> nodeIndexscan.
> >
> > Yeah, this is grotty. ?It appears that the comment introducing
> > ExecReScan() is somewhat incorrect. ?It asserts that exprCtxt is used
> > only
> 
> Sigh.
> 
> ...is used only for index scans.  However, it's actually also used for
> bitmap scans (both heap and index) and TID scans.  Also, there appears
> to be an effort by nodes that don't use exprCtxt directly to propagate
> down through the node tree, which doesn't seem to make much sense if
> this is only intended to be used on the inner side of a nestloop.

Does some comment need to be updated?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: LATERAL