Обсуждение: LATERAL
I've attempted to search the archives for references to the SQL LATERAL feature, which AIUI is fairly-frequently requested. Most of the discussion that I've found harks back to 2003, and that seems to discuss more the need for eventual support of the feature than exactly what it is or how to implement it. Looking around the web a little, I found these links: http://farrago.sourceforge.net/design/CollectionTypes.html http://iablog.sybase.com/paulley/2008/07/cross-and-outer-apply/ Based on reading through this discussion, it appears that LATERAL is mostly a bit of syntactic sugar that requests that the parser allow you to reference tables at the same query level. Assuming that the necessary executor support were present (which it's currently not), it's unclear to me why one couldn't simply allow such references unconditionally. We currently explicitly block such references in parse_clause.c, but the comments indicate that this is for reasons of standards-conformance, not semantic ambiguity. It appears that we're not completely without the ability to handle queries of this type. For example, this works: select g, generate_series(1,g) as h from generate_series(1,10) g; But this doesn't: select g, h from generate_series(1,10) g, generate_series(1,g) h; Just for kicks, I tried removing the code that throws a syntax error on the latter query, which resulted in a core dump inside ExecEvalVar(), execQual.c:546, trying to deference a TupleTableSlot. I'm guessing that this is because it can't get access to the right variables - the plan actually looks quite sane: rhaas=# explain select g, h from generate_series(1,10) g, generate_series(1,g) h; QUERY PLAN --------------------------------------------------------------------------------Nested Loop (cost=0.00..22512.50 rows=1000000width=8) -> Function Scan on generate_series g (cost=0.00..12.50 rows=1000 width=4) -> Function Scan ongenerate_series h (cost=0.00..12.50 rows=1000 width=4) (3 rows) The first query just shows up a function scan, which means there's some projection operation happening here that isn't exposed at the plan level. I'm guessing that what needs to happen here is that the planner needs to be taught that LATERAL queries can only be implemented as a nested loop (right? unless they're not really using the LATERAL-ness...) and that the executor needs to be taught how to make the vars for the outer side of a nestloop visible to the inner side, when necessary. Has anyone poked at this at all? ...Robert
On Sun, Sep 06, 2009 at 11:59:22PM -0400, Robert Haas wrote: > I've attempted to search the archives for references to the SQL > LATERAL feature, which AIUI is fairly-frequently requested. > [snip] > Has anyone poked at this at all? I believe Andrew (RhodiumToad) Gierth is taking a look at implementing the full LATERAL feature from SQL:2008. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote: > Based on reading through this discussion, it appears that LATERAL is > mostly a bit of syntactic sugar that requests that the parser allow > you to reference tables at the same query level. Assuming that the > necessary executor support were present (which it's currently not), > it's unclear to me why one couldn't simply allow such references > unconditionally. Because joins can be reordered, whereas LATERAL creates a kind of syntactic sequence point for join reordering. To pick up your example: > But this doesn't [work]: > > select g, h from generate_series(1,10) g, generate_series(1,g) h; You need to constrain the order of the from items in some way so the "g" refers to something well-defined. That's what LATERAL does. You could argue that the parser could infer the references and the resultant join ordering restrictions automatically, but perhaps it was deemed that an explicit specification would be less error-prone.
On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote: >> Based on reading through this discussion, it appears that LATERAL is >> mostly a bit of syntactic sugar that requests that the parser allow >> you to reference tables at the same query level. Assuming that the >> necessary executor support were present (which it's currently not), >> it's unclear to me why one couldn't simply allow such references >> unconditionally. > > Because joins can be reordered, whereas LATERAL creates a kind of > syntactic sequence point for join reordering. To pick up your example: > >> But this doesn't [work]: >> >> select g, h from generate_series(1,10) g, generate_series(1,g) h; > > You need to constrain the order of the from items in some way so the "g" > refers to something well-defined. That's what LATERAL does. I don't think so. All joins constrain the join order, but none of them except FULL JOIN constrain it completely, and this is no exception. Consider this query: select * from foo f, generate_series(1, 10) g, generate_series(1, g) h WHERE f.x = g; There are two legal join orderings here: f {g h} and {f g} h, just as there would be for a three-way inner join: select * from foo f, goo g, hoo h WHERE f.x = g.x and g.x = h.x; I believe that the join-order restrictions imposed by a LATERAL SRF are identical to those that would be imposed by an inner join against a table with join clauses referencing the same tables used in the inputs to the SRF. What is different is that there is only one possible method of implementing the join, namely, for each outer row, evaluate the SRF. We can't hash or merge, and we can't swap the inner and outer rels, but we do still have a choice as to whether to join against h before or after joining against f. > You could argue that the parser could infer the references and the > resultant join ordering restrictions automatically, but perhaps it was > deemed that an explicit specification would be less error-prone. Well, the irony is that our current code does already infer the references - and then disallows them. It seems to me that if we implement LATERAL(), we're basically going to be conditionalizing those checks on whether the relevant subexpression has been wrapped in LATERAL or not. Introducing a new keyword would make sense if it were possible to write a query that is valid without LATERAL(), but means something different with LATERAL() - but I'm currently unable to devise such a scenario. Whether this is for want of creativity or because none exists I'm not entirely sure. Any ideas? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote: >> You could argue that the parser could infer the references and the >> resultant join ordering restrictions automatically, but perhaps it was >> deemed that an explicit specification would be less error-prone. > Well, the irony is that our current code does already infer the > references - and then disallows them. Because as often as not, they're mistakes. Please don't think you're smarter than the spec here. regards, tom lane
On Mon, Sep 7, 2009 at 7:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote: >>> You could argue that the parser could infer the references and the >>> resultant join ordering restrictions automatically, but perhaps it was >>> deemed that an explicit specification would be less error-prone. > >> Well, the irony is that our current code does already infer the >> references - and then disallows them. > > Because as often as not, they're mistakes. Please don't think > you're smarter than the spec here. You're frequently the first to criticize the spec, but I have no interest in second-guessing whatever behavior the spec specifies for this construct. I'm just trying to understand it, and as far as I can tell, LATERAL() is just a piece of syntactic sugar that allows expressions within to reference FROM items at the same query level. As far as I can tell, it doesn't change the semantic of any legal queries - it just renders legal notation that otherwise would be rejected. But is my understanding correct? I haven't got a copy of the spec, so that's a bit of a handicap. If someone who does can look this up and comment on how it's supposed to work, I would certainly appreciate that. My understanding of it is currently based on random articles cherry-picked around the Internet and a handful of emails from archives.postgresql.org, which seems a little thin. ...Robert
Robert Haas escribió: > I haven't got a copy of the spec, so that's a bit of a handicap. If > someone who does can look this up and comment on how it's supposed to > work, I would certainly appreciate that. My understanding of it is > currently based on random articles cherry-picked around the Internet > and a handful of emails from archives.postgresql.org, which seems a > little thin. http://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert, * Robert Haas (robertmhaas@gmail.com) wrote: > On Mon, Sep 7, 2009 at 7:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > > Because as often as not, they're mistakes. Please don't think > > you're smarter than the spec here. > > You're frequently the first to criticize the spec, but I have no > interest in second-guessing whatever behavior the spec specifies for > this construct. I'm not entirely sure you followed what Tom was getting at here. If you did, feel free to ignore me. > I'm just trying to understand it, and as far as I can > tell, LATERAL() is just a piece of syntactic sugar that allows > expressions within to reference FROM items at the same query level. What I'm gathering is that this may be correct, though I don't know for sure. The point I think Tom was making is that even if it *is* just syntactic sugar, we don't want to allow expressions to reference FROM items at the same query level *unless* LATERAL is specified. Your earlier comments sounded like you would want to implement allowing expressions to refer to FROM items at the same query level without LATERAL being specified. > I haven't got a copy of the spec, so that's a bit of a handicap. If > someone who does can look this up and comment on how it's supposed to > work, I would certainly appreciate that. My understanding of it is > currently based on random articles cherry-picked around the Internet > and a handful of emails from archives.postgresql.org, which seems a > little thin. You can get a 'draft' that's very close to the spec pretty easily.. Just do '??sql' in IRC sometime.. Enjoy, Stephen
On Mon, Sep 7, 2009 at 9:12 PM, Stephen Frost<sfrost@snowman.net> wrote: > Robert, > > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Mon, Sep 7, 2009 at 7:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> > Because as often as not, they're mistakes. Please don't think >> > you're smarter than the spec here. >> >> You're frequently the first to criticize the spec, but I have no >> interest in second-guessing whatever behavior the spec specifies for >> this construct. > > I'm not entirely sure you followed what Tom was getting at here. If you > did, feel free to ignore me. > >> I'm just trying to understand it, and as far as I can >> tell, LATERAL() is just a piece of syntactic sugar that allows >> expressions within to reference FROM items at the same query level. > > What I'm gathering is that this may be correct, though I don't know for > sure. The point I think Tom was making is that even if it *is* just > syntactic sugar, we don't want to allow expressions to reference FROM > items at the same query level *unless* LATERAL is specified. Your > earlier comments sounded like you would want to implement allowing > expressions to refer to FROM items at the same query level without > LATERAL being specified. Fair enough. I think I started to drift off in the direction of making that argument, but it wasn't really my point. The original point I was trying to make is that we may not need to invent any kind of new name-resolution or scoping in order to make LATERAL() work. Instead, LATERAL() can just do everything normally with the exception of not throwing the following errors when they would otherwise be thrown: subquery in FROM cannot refer to other relations of the same query level function expression in FROM cannot refer to other relations of the same query level I'm not sure if I'm right about this, but if I am, it seems likely to be a pretty straightforward change. >> I haven't got a copy of the spec, so that's a bit of a handicap. If >> someone who does can look this up and comment on how it's supposed to >> work, I would certainly appreciate that. My understanding of it is >> currently based on random articles cherry-picked around the Internet >> and a handful of emails from archives.postgresql.org, which seems a >> little thin. > > You can get a 'draft' that's very close to the spec pretty easily.. > Just do '??sql' in IRC sometime.. Thanks for the tip. ...Robert
* Robert Haas (robertmhaas@gmail.com) wrote: > Fair enough. I think I started to drift off in the direction of > making that argument, but it wasn't really my point. To be honest, I'm not sure I agree with Tom here on the value of requiring a keyword to tell the system that you really mean what you wrote. On the other hand, it sounds like the spec is pretty clear on this, and I don't feel we should violate it just because we think it's being silly on this point. > The original > point I was trying to make is that we may not need to invent any kind > of new name-resolution or scoping in order to make LATERAL() work. > Instead, LATERAL() can just do everything normally with the exception > of not throwing the following errors when they would otherwise be > thrown: I don't know for sure, but I do hope you're right. I'd certainly love to be able to do this in general.. There's a number of cases where I've had to do the hokey-pokey to get around our lack of ability to do this when using set-returning functions. > I'm not sure if I'm right about this, but if I am, it seems likely to > be a pretty straightforward change. Please continue to explore it and propose a patch. :) Thanks, Stephen
On Mon, Sep 7, 2009 at 10:08 PM, Stephen Frost<sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Fair enough. I think I started to drift off in the direction of >> making that argument, but it wasn't really my point. > > To be honest, I'm not sure I agree with Tom here on the value of > requiring a keyword to tell the system that you really mean what you > wrote. On the other hand, it sounds like the spec is pretty clear on > this, and I don't feel we should violate it just because we think it's > being silly on this point. That's pretty much where I am, too, modulo needing to better understand the aforesaid spec. >> The original >> point I was trying to make is that we may not need to invent any kind >> of new name-resolution or scoping in order to make LATERAL() work. >> Instead, LATERAL() can just do everything normally with the exception >> of not throwing the following errors when they would otherwise be >> thrown: > > I don't know for sure, but I do hope you're right. I'd certainly love > to be able to do this in general.. There's a number of cases where I've > had to do the hokey-pokey to get around our lack of ability to do this > when using set-returning functions. Me too. >> I'm not sure if I'm right about this, but if I am, it seems likely to >> be a pretty straightforward change. > > Please continue to explore it and propose a patch. :) Yeah, that's the not-so-easy part. Gotta grok this executor stuff first before I can even think about implementing this. ...Robert
On mån, 2009-09-07 at 19:06 -0400, Robert Haas wrote: > On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote: > > Because joins can be reordered, whereas LATERAL creates a kind of > > syntactic sequence point for join reordering. To pick up your example: > > > >> But this doesn't [work]: > >> > >> select g, h from generate_series(1,10) g, generate_series(1,g) h; > > > > You need to constrain the order of the from items in some way so the "g" > > refers to something well-defined. That's what LATERAL does. > > I don't think so. All joins constrain the join order, I carefully did not say "constrain the join order" but "constraint the order of the from items" and "a *syntactic* sequence point". If the order of the from items is free, then you could also write the above example as select g, h from generate_series(1,g) h, generate_series(1,10) g; but that would no longer be valid with LATERAL in there. > but none of > them except FULL JOIN constrain it completely, and this is no > exception. FWIW, the spec says somewhere that LATERAL is not allowed at or near an outer join. I'll have to read up on it again.
>>>>> "David" == David Fetter <david@fetter.org> writes: >> I've attempted to search the archives for references to the SQL>> LATERAL feature, which AIUI is fairly-frequently requested.>>[snip]>> Has anyone poked at this at all? David> I believe Andrew (RhodiumToad) Gierth is taking a look atDavid> implementing the full LATERAL feature from SQL:2008. I've looked, but I've not actually had time to try any actual work on implementation, so anyone else who fancies a go shouldn't hesitate. Just to pick up on some points from the discussion: 1. LATERAL has to be explicit because it changes the scope of references. For example, in: ... (select ... FROM (select a AS b), (select b)) ... the "b" in the second subselect could be an outer reference, but it can't be a reference to the first subquery; whereas in: ... (select ... FROM (select a AS b), LATERAL (select b)) ... the "b" in the second subselect refers to the result of the first subselect. 2. LATERAL in general constrains both the join order and the join plan, assuming any lateral references are actually made. 3. LATERAL specifically IS allowed with left outer joins, though the syntax productions in the spec are sufficiently obscure that this isn't obvious. In general there are (as far as I can tell from the syntax rules) two ways to use it: SELECT ... FROM foo, LATERAL (bar) or SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ... Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded (syntax rule 2 for "<joined table>"). 4. LATERAL allows some optimizations that aren't currently done, either by explicitly rewriting the query, or (in theory) the optimizer itself could consider a lateral plan (I believe Oracle does this). This would apply to queries of this form: SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a); which currently forces the t2/t3 join to occur first even where t1 is small; this could be rewritten with LATERAL as: SELECT ... FROM t1 LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a) wheret2.a=t1.a) s ON true; -- Andrew (irc:RhodiumToad)
On Tue, Sep 8, 2009 at 6:29 PM, Andrew Gierth<andrew@tao11.riddles.org.uk> wrote: >>>>>> "David" == David Fetter <david@fetter.org> writes: > > >> I've attempted to search the archives for references to the SQL > >> LATERAL feature, which AIUI is fairly-frequently requested. > >> [snip] > >> Has anyone poked at this at all? > > David> I believe Andrew (RhodiumToad) Gierth is taking a look at > David> implementing the full LATERAL feature from SQL:2008. > > I've looked, but I've not actually had time to try any actual work on > implementation, so anyone else who fancies a go shouldn't hesitate. Thanks for your thoughts - I appreciate it. > Just to pick up on some points from the discussion: > > 1. LATERAL has to be explicit because it changes the scope of > references. For example, in: > ... (select ... FROM (select a AS b), (select b)) ... > the "b" in the second subselect could be an outer reference, but > it can't be a reference to the first subquery; whereas in: > ... (select ... FROM (select a AS b), LATERAL (select b)) ... > the "b" in the second subselect refers to the result of the first > subselect. Can you provide a more complete example? I'm unable to construct a working example of this type. For example: rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1; ERROR: subquery in FROM cannot refer to other relations of same query level at character 50 Though this works as expected: rhaas=# select (select 1 from (select a) x, (select b) y) from t1; > 2. LATERAL in general constrains both the join order and the join > plan, assuming any lateral references are actually made. Peter seemed to be saying that LATERAL() must syntactically follow the same-level FROM items to which it refers. Is that your understanding also? > 3. LATERAL specifically IS allowed with left outer joins, though the > syntax productions in the spec are sufficiently obscure that this > isn't obvious. In general there are (as far as I can tell from the > syntax rules) two ways to use it: > > SELECT ... FROM foo, LATERAL (bar) > > or > > SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ... > > Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded > (syntax rule 2 for "<joined table>"). Makes sense to me. > 4. LATERAL allows some optimizations that aren't currently done, either > by explicitly rewriting the query, or (in theory) the optimizer itself > could consider a lateral plan (I believe Oracle does this). This would > apply to queries of this form: > > SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a); > > which currently forces the t2/t3 join to occur first even where t1 is > small; this could be rewritten with LATERAL as: > > SELECT ... > FROM t1 > LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a) > where t2.a=t1.a) s > ON true; Well, you haven't actually commuted the joins here - how do you have in mind for PostgreSQL to execute this? I'm guessing that it's something like a nest loop with t1 as the outer side and the lateral subquery as the inner side, so that the executor repeatedly executes "select * from t2 join t3 on t2.a = t3.a where t2.a = $1"? ...Robert
>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: >> Just to pick up on some points from the discussion:>> >> 1. LATERAL has to be explicit because it changes the scope of>>references. For example, in:>> ... (select ... FROM (select a AS b), (select b)) ...>> the "b" in the second subselectcould be an outer reference, but>> it can't be a reference to the first subquery; whereas in:>> ... (select ...FROM (select a AS b), LATERAL (select b)) ...>> the "b" in the second subselect refers to the result of the first>> subselect. Robert> Can you provide a more complete example? I'm unable toRobert> construct a working example of this type. For example: Robert> rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1;Robert> ERROR: subquery in FROM cannot referto other relations of same queryRobert> level at character 50 That looks like a bug to me. The spec is explicit that the inner definition of b is not in scope in the second subquery, and therefore that should parse as an outer reference. >> 2. LATERAL in general constrains both the join order and the join>> plan, assuming any lateral references are actuallymade. Robert> Peter seemed to be saying that LATERAL() must syntacticallyRobert> follow the same-level FROM items to which it refers. Is thatRobert> your understanding also? LATERAL references must be to items defined in the same FROM clause and to the left of the LATERAL. The relevant language of the spec seems to be: a) If TR is contained in a <from clause> FC with no intervening <query expression>, then the scope clause SC of TR isthe <select statement: single row> or innermost <query specification> that contains FC. The scope of a range variableof TR is the <select list>, <where clause>, <group by clause>, <having clause>, and <window clause> of SC,together with every <lateral derived table> that is simply contained in FC and is preceded by TR, and every <collectionderived table> that is simply contained in FC and is preceded by TR, and the <join condition> of all <joinedtable>s contained in SC that contain TR. If SC is the <query specification> that is the <query expression body>of a simple table query STQ, then the scope of a range variable of TR also includes the <order by clause> of STQ. >> 4. LATERAL allows some optimizations that aren't currently done, either>> by explicitly rewriting the query, or (in theory)the optimizer itself>> could consider a lateral plan (I believe Oracle does this). This would>> apply to queries ofthis form:>> >> SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);>> >> which currently forces thet2/t3 join to occur first even where t1 is>> small; this could be rewritten with LATERAL as:>> >> SELECT ...>> FROM t1>> LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)>> where t2.a=t1.a)s>> ON true; Robert> Well, you haven't actually commuted the joins here - how doRobert> you have in mind for PostgreSQL to execute this? I'mRobert> guessing that it's something like a nest loop with t1 as theRobert> outer side and the lateral subqueryas the inner side, soRobert> that the executor repeatedly executesRobert> "select * from t2 join t3 on t2.a = t3.awhere t2.a = $1"? Yup. The current execution plans for this type of query are completely disastrous if t1 is small (or qualified so as to be small) and t2 and t3 are large. Having LATERAL would allow the query to be rewritten to perform reasonably; a bonus would be for the planner to consider the lateral join automatically without requiring it to be explicitly requested. -- Andrew (irc:RhodiumToad)
On Wed, Sep 9, 2009 at 11:25 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > >> 4. LATERAL allows some optimizations that aren't currently done, either > >> by explicitly rewriting the query, or (in theory) the optimizer itself > >> could consider a lateral plan (I believe Oracle does this). This would > >> apply to queries of this form: > >> > >> SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a); > >> > >> which currently forces the t2/t3 join to occur first even where t1 is > >> small; this could be rewritten with LATERAL as: > >> > >> SELECT ... > >> FROM t1 > >> LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a) > >> where t2.a=t1.a) s > >> ON true; > > Robert> Well, you haven't actually commuted the joins here - how do > Robert> you have in mind for PostgreSQL to execute this? I'm > Robert> guessing that it's something like a nest loop with t1 as the > Robert> outer side and the lateral subquery as the inner side, so > Robert> that the executor repeatedly executes > Robert> "select * from t2 join t3 on t2.a = t3.a where t2.a = $1"? > > Yup. > > The current execution plans for this type of query are completely > disastrous if t1 is small (or qualified so as to be small) and t2 and > t3 are large. Having LATERAL would allow the query to be rewritten to > perform reasonably; a bonus would be for the planner to consider the > lateral join automatically without requiring it to be explicitly > requested. I've been turning this one over in my head. It seems to me that this is very similar to what we already do with inner index-scans, only generalized to joinrels. When we consider nested loop plans in match_unsorted_outer(), we really consider two quite different types of plans - call them parameterized and unparameterized. The unparameterized variants considered regardless of the details of the inner plan, and basically rescan the same identical inner side once per outer row, possibly with a materialize node interposed. The parameterized variants are what the inner index-scan code is looking at: they also involve rescanning the inner path, but not the same set of tuples every time. Instead, we look for cases where an index is available that can support a lookup of the specific values that the outer side needs on each pass as an index condition. Currently, however, we only consider this possibility when the inner rel is NOT a joinrel. It seems like it might be possible to change this, but it doesn't look straightforward. When the inner rel is a baserel, there's really nothing to do except grovel through the indexes looking for something useful, and then estimate the cost of using it. When the inner rel is a joinrel, it's a lot more complicated. A parameterized nested loop might be right even if only some of the rels making up the joinrel are indexed (imagine t2 IJ t3 IJ t4 where t2 and t3 are large and indexed and t4 is small and unindexed). In addition, the relations making up the inner rel could be joined in any order and using a variety of strategies. The path that is best when trying to suck down the ENTIRE inner rel doesn't necessarily bear any resemblence to the plan that is best when trying to suck down only a part of it. To some extent, this seems like a tangent as far as LATERAL is concerned: I think the primary reason why people want LATERAL (certainly, why I want it) is for SRFs, for which there isn't any choice of execution plan anyway. But it's certainly an interesting optimization problem, and I wish I had some better ideas on how to tackle it. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I've been turning this one over in my head. It seems to me that this > is very similar to what we already do with inner index-scans, only > generalized to joinrels. Right. > Currently, however, we only consider this possibility when the inner > rel is NOT a joinrel. It seems like it might be possible to change > this, but it doesn't look straightforward. Well, it's straightforward enough in theory, it's just the exponential growth in the number of join paths to consider that's a problem :-(. I think what we'd need is a heuristic to limit the paths considered. I think Andrew was suggesting that we not attempt to consider this automatically, but only when the user writes the query in a way that exposes it directly via LATERAL. While that goes against my normal instincts for the planner, it isn't unreasonable as a first step. regards, tom lane
On Tue, Sep 22, 2009 at 11:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Currently, however, we only consider this possibility when the inner >> rel is NOT a joinrel. It seems like it might be possible to change >> this, but it doesn't look straightforward. > > Well, it's straightforward enough in theory, it's just the exponential > growth in the number of join paths to consider that's a problem :-(. > I think what we'd need is a heuristic to limit the paths considered. [ picking this up again now that CommitFest is over ] So that I don't have to keep referring to "this possibility" and similar circumlocution, let's define an index-accelerated path (feel free to suggest a different term) to be the generalization to joinrels of what we now call a nested inner index-scan. In other words, they can only be usefully used on the inner side of a nested loop, where they'll be rescanned with different parameters for each outer row, and they can only ever win when some part of the path involves a *partial* index scan that can used one of the passed parameters as an index qual. For a fixed a set of parameters to be pushed down from the outer side, it seems to me that we could build up a set of index-accelerated paths for a joinrel {A B} by considering the combination of (1) an index-accelerated path for A joined to an index-accelerated path for B, or (2) an index-accelerated path for A joined to the cheapest total path for B. I believe we don't need to worry about path keys because this will only ever be used on the inner side of a nested loop, so the pathkeys will be ignored anyway. In other words, the first heuristic is that you can't build up an index-accelerated path for the joinrel unless there is an index-accelerated path for a least one of its baserels. That still leaves a lot of silly paths, though. In many cases, if you're thinking about joining A to {B C} using an index-accelerated path, you'd be just as well off joining to B first and then to C. So it might be that we only need to consider index-accelerated paths when there is no legal join between the LHS and a subset of the RHS. But I'm not completely sure that I'm right about this, nor whether it's easy to check for. The other problem I see here is that the bottom-up approach that we use in general is going to be difficult to apply here, because the set of paths will vary depending on what parameters are pushed down from the outer side. > I think Andrew was suggesting that we not attempt to consider this > automatically, but only when the user writes the query in a way that > exposes it directly via LATERAL. While that goes against my normal > instincts for the planner, it isn't unreasonable as a first step. Possibly, but I'm not sure we have a good enough design sketch at this point to even know whether such a restriction would be useful. Another thought, only semi-related. One of the big use cases for LATERAL in general is to use a set-returning function in the FROM clause that uses vars from a preceding FROM item. I am idly wondering if there's a reason why ExecProject is not its own node type. It seems that we have close to the same logic scattered through a whole bunch of different node types... ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Another thought, only semi-related. One of the big use cases for > LATERAL in general is to use a set-returning function in the FROM > clause that uses vars from a preceding FROM item. I am idly wondering > if there's a reason why ExecProject is not its own node type. You generally need it everywhere. Scan nodes need it because you don't want unused columns propagating upwards, and join nodes need it because the two input tuples have to be unified somehow. We do skip projection ability in a few node types, but I doubt it would be profitable to remove it from the rest. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > That still leaves a lot of silly paths, though. In many cases, if > you're thinking about joining A to {B C} using an index-accelerated > path, you'd be just as well off joining to B first and then to C. So > it might be that we only need to consider index-accelerated paths when > there is no legal join between the LHS and a subset of the RHS. Yeah. If there are no join order constraints, it's always possible to form a plan such that you join a given rel only once you have all the rels needed (to provide values for the inner indexscan) on the lefthand side. I think this is probably the main reason why the issue was not treated in the planner originally. So maybe the way to think about this is as a way of dealing with join order constraints without losing the benefits of inner indexscans. > The other problem I see here is that the bottom-up approach that we > use in general is going to be difficult to apply here, because the set > of paths will vary depending on what parameters are pushed down from > the outer side. Well, we deal with that already --- the set of possible inner indexscan paths already varies depending on what the LHS is. I think the point here is that we'd be considering an inner path that's against an LHS that it's not legal to join the inner rel to *directly*. Such a path would only be legal if we later join to that LHS at a higher join level. So we'd be keeping around some tentative paths that might not ever form a valid join plan. Maybe we should turn around the way that inner indexscan paths are made. Currently we form them on-the-fly while considering a valid join combination. Maybe we should build them all at the first level (driving this off the set of available join clauses for each base rel) and mark each such path as "requires a join to this other set of rels to be valid". But then we'd go ahead and join such paths to *other* rels, keeping the resulting join paths still marked as requiring the same future join. Once that join actually happens, the resulting path becomes fully valid. Only a join to a proper subset of the future-join requirement would be disallowed meanwhile. I'm not even sure this would be slower or more complicated than what we do now --- if you look at the logic that caches potential inner indexscan plans, it's almost doing this already. It would result in considering more join paths, but only ones that have some plausible use. regards, tom lane
On Sat, Oct 17, 2009 at 10:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> That still leaves a lot of silly paths, though. In many cases, if >> you're thinking about joining A to {B C} using an index-accelerated >> path, you'd be just as well off joining to B first and then to C. So >> it might be that we only need to consider index-accelerated paths when >> there is no legal join between the LHS and a subset of the RHS. > > Yeah. If there are no join order constraints, it's always possible to > form a plan such that you join a given rel only once you have all the > rels needed (to provide values for the inner indexscan) on the lefthand > side. I think this is probably the main reason why the issue was not > treated in the planner originally. So maybe the way to think about > this is as a way of dealing with join order constraints without losing > the benefits of inner indexscans. > >> The other problem I see here is that the bottom-up approach that we >> use in general is going to be difficult to apply here, because the set >> of paths will vary depending on what parameters are pushed down from >> the outer side. > > Well, we deal with that already --- the set of possible inner indexscan > paths already varies depending on what the LHS is. I think the point > here is that we'd be considering an inner path that's against an LHS > that it's not legal to join the inner rel to *directly*. Such a path > would only be legal if we later join to that LHS at a higher join level. > So we'd be keeping around some tentative paths that might not ever form > a valid join plan. > > Maybe we should turn around the way that inner indexscan paths are > made. Currently we form them on-the-fly while considering a valid > join combination. Maybe we should build them all at the first level > (driving this off the set of available join clauses for each base rel) > and mark each such path as "requires a join to this other set of rels > to be valid". But then we'd go ahead and join such paths to *other* > rels, keeping the resulting join paths still marked as requiring the > same future join. Once that join actually happens, the resulting path > becomes fully valid. Only a join to a proper subset of the future-join > requirement would be disallowed meanwhile. > > I'm not even sure this would be slower or more complicated than what we > do now --- if you look at the logic that caches potential inner > indexscan plans, it's almost doing this already. It would result in > considering more join paths, but only ones that have some plausible use. Wow, that's pretty sneaky. I had to read this email twice before I understood what you were proposing. It sounds to me like it will work. I'm not 100% sure what the impact on planner performance will be, but it's at least plausible that it will be OK. I think you should only ever join an incomplete inner-indexscan path to (1) another inner-indexscan path or (2) the cheapest total path for *exactly* the future-join requirement. Anything else doesn't seem productive. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I think you should only ever join an incomplete inner-indexscan path > to (1) another inner-indexscan path or (2) the cheapest total path for > *exactly* the future-join requirement. Anything else doesn't seem > productive. I don't see any reason to constrain the form of joins before the future-join requirement. Once you get to the join that satisfies that requirement, obviously you must implement it as a nestloop with the inner-indexscan path on the inside. But there shouldn't be any more constraint beyond that one for that join, either:* you might want a fast-start plan not cheapest total* sortorderings of the outer path might be useful We know that sort ordering of the inner indexscan or its joins won't be useful once we reach the future-join level, so it might be possible to discard paths that are not cheapest total cost below that level. The problem is to distinguish sort orderings that are useful within joins below that level. You could avoid that if you could convince yourself that there's no point in ever doing a mergejoin at such a level, but I don't think I believe that. It may well be that there's no point in being too tense about this issue anyway. The planner will only consider early joins to rels that have join clauses to the rel with the inner-indexscan path. There probably won't be that many of them. regards, tom lane
On Sun, Oct 18, 2009 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I think you should only ever join an incomplete inner-indexscan path >> to (1) another inner-indexscan path or (2) the cheapest total path for >> *exactly* the future-join requirement. Anything else doesn't seem >> productive. > > I don't see any reason to constrain the form of joins before the > future-join requirement. Once you get to the join that satisfies > that requirement, obviously you must implement it as a nestloop > with the inner-indexscan path on the inside. But there shouldn't > be any more constraint beyond that one for that join, either: > * you might want a fast-start plan not cheapest total > * sort orderings of the outer path might be useful > > We know that sort ordering of the inner indexscan or its joins > won't be useful once we reach the future-join level, so it might > be possible to discard paths that are not cheapest total cost > below that level. Yeah, this was what I was driving at, but I got turned around in my head and was explaining it incorrectly. > The problem is to distinguish sort orderings > that are useful within joins below that level. You could avoid that > if you could convince yourself that there's no point in ever doing > a mergejoin at such a level, but I don't think I believe that. > > It may well be that there's no point in being too tense about this issue > anyway. The planner will only consider early joins to rels that have > join clauses to the rel with the inner-indexscan path. There probably > won't be that many of them. You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? ... especially when there are partial index scans involved) but I think your last point here is well-taken. We've certainly turned a corner from "it's just the exponential growth in the number of join paths to consider that's a problem". :-) ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > You could probably convince me that a merge join is not going to be > too useful (how often can you want a merge join on the inner side of a > nested loop? Why not? As Andrew pointed out, what we're really trying to accomplish here is consider sub-join plans that are parameterized by a value obtained from an outer relation. I think we shouldn't artificially limit what we consider. But anyway I think we're on the same page here: what we ought to do is try implementing this scheme without any extra restrictions on what it considers, and see what the performance is like. We can try to limit what it considers if it turns out not to work well in the simplest form. regards, tom lane
On Sun, Oct 18, 2009 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> You could probably convince me that a merge join is not going to be >> too useful (how often can you want a merge join on the inner side of a >> nested loop? > > Why not? As Andrew pointed out, what we're really trying to accomplish > here is consider sub-join plans that are parameterized by a value > obtained from an outer relation. I think we shouldn't artificially > limit what we consider. > > But anyway I think we're on the same page here: what we ought to do is > try implementing this scheme without any extra restrictions on what it > considers, and see what the performance is like. We can try to limit > what it considers if it turns out not to work well in the simplest > form. And then when we get done we can consider implementing $SUBJECT, which is no longer what this thread is about at all. :-) ...Robert
On Sun, Oct 18, 2009 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> You could probably convince me that a merge join is not going to be >> too useful (how often can you want a merge join on the inner side of a >> nested loop? > > Why not? As Andrew pointed out, what we're really trying to accomplish > here is consider sub-join plans that are parameterized by a value > obtained from an outer relation. I think we shouldn't artificially > limit what we consider. Am I understanding you right that a typical case of this might be something like nested loop index scan expecting 1 record merge join index scan on partial index where col = outer.foo and col2 between a and b some other scan or nested loop index scan expecting 1 record merge join index scan on <col1,col2> where col1 = outer.foo and col2 between a and b some other scan Ie, where the nested loop is a degenerate nested loop which only expects a single value and provides a parameter which allows some partial index to work or allows for some other index scan by providing a higher order key element? -- greg
Greg Stark <gsstark@mit.edu> writes: > nested loop > index scan expecting 1 record > merge join > index scan on <col1,col2> where col1 = outer.foo and col2 > between a and b > some other scan > Ie, where the nested loop is a degenerate nested loop which only > expects a single value and provides a parameter which allows some > partial index to work or allows for some other index scan by providing > a higher order key element? Right. I don't see any particular reason to assume the inner path is iterated only once, either. If the key value coming from the outer path is sufficiently useful, this could be a win even with multiple iterations, as compared to having to scan the whole of some large relation or other ... regards, tom lane
>>>>> "Greg" == Greg Stark <gsstark@mit.edu> writes: >> Why not? As Andrew pointed out, what we're really trying to>> accomplish here is consider sub-join plans that are parameterized>>by a value obtained from an outer relation. I think we shouldn't>> artificially limit what we consider. Greg> Am I understanding you right that a typical case of this mightGreg> be something like Greg> nested loopGreg> index scan expecting 1 recordGreg> merge joinGreg> index scan on partial index wherecol = outer.foo and col2Greg> between a and bGreg> some other scan no, because you could never pick the partial index at plan time. Greg> or Greg> nested loopGreg> index scan expecting 1 recordGreg> merge joinGreg> index scan on <col1,col2> wherecol1 = outer.foo and col2Greg> between a and bGreg> some other scan Greg> Ie, where the nested loop is a degenerate nested loop whichGreg> only expects a single value and provides a parameterwhichGreg> allows some partial index to work or allows for some otherGreg> index scan by providing a higher orderkey element? The nested loop does NOT have to be degenerate. Consider queries of this form: select * from small left join (big1 join big2 on (big1.id=big2.id)) on (small.id=big1.id); Right now, the only way pg can plan this is to do a hashjoin or mergejoin of the _entire content of big1 and big2_ and join the result against "small" (again in a hashjoin or mergejoin plan). This becomes excessively slow compared to the "ideal" plan: nested loop seqscan on small nested loop indexscan on big1 where id=small.id indexscan on big2 whereid=small.id (or big1.id which is equiv) (The same argument applies if "small" is not actually small but has restriction clauses) -- Andrew (irc:RhodiumToad)
On Sun, Oct 18, 2009 at 2:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> You could probably convince me that a merge join is not going to be >> too useful (how often can you want a merge join on the inner side of a >> nested loop? > > Why not? As Andrew pointed out, what we're really trying to accomplish > here is consider sub-join plans that are parameterized by a value > obtained from an outer relation. I think we shouldn't artificially > limit what we consider. > > But anyway I think we're on the same page here: what we ought to do is > try implementing this scheme without any extra restrictions on what it > considers, and see what the performance is like. We can try to limit > what it considers if it turns out not to work well in the simplest > form. You mention what "we" ought to do here, so - is this something that you're planning to have a go at? Another question I have - while generalizing the inner-indexscan machinery is an interesting join optimization technique, I'm thinking that it actually has very little to do with LATERAL. Is there any reason to suppose that one or the other needs to be done first? ...Robert
2009/10/20 Andrew Gierth <andrew@tao11.riddles.org.uk>: > Right now, the only way pg can plan this is to do a hashjoin or > mergejoin of the _entire content of big1 and big2_ and join the > result against "small" (again in a hashjoin or mergejoin plan). > This becomes excessively slow compared to the "ideal" plan: > > nested loop > seqscan on small > nested loop > indexscan on big1 where id=small.id > indexscan on big2 where id=small.id (or big1.id which is equiv) > > (The same argument applies if "small" is not actually small but has > restriction clauses) I have a similar issue on my mind, but is this the same as the topic? SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla) The ideal plan is SeqScan on small with filtering sub query aggregate on large by small.id but the actual plan is full aggregate on large since the planner doesn't push down outer qual to aggregate node. The output will discard almost all of agged's output. Regards, -- Hitoshi Harada
On Sat, Dec 19, 2009 at 12:49 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > 2009/10/20 Andrew Gierth <andrew@tao11.riddles.org.uk>: >> Right now, the only way pg can plan this is to do a hashjoin or >> mergejoin of the _entire content of big1 and big2_ and join the >> result against "small" (again in a hashjoin or mergejoin plan). >> This becomes excessively slow compared to the "ideal" plan: >> >> nested loop >> seqscan on small >> nested loop >> indexscan on big1 where id=small.id >> indexscan on big2 where id=small.id (or big1.id which is equiv) >> >> (The same argument applies if "small" is not actually small but has >> restriction clauses) > > I have a similar issue on my mind, but is this the same as the topic? > > SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY > large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla) > > The ideal plan is SeqScan on small with filtering sub query aggregate > on large by small.id but the actual plan is full aggregate on large > since the planner doesn't push down outer qual to aggregate node. The > output will discard almost all of agged's output. I just tried this and it works for me. create table foo (id serial, name varchar, primary key (id)); create table bar (id serial, foo_id integer references foo (id), name varchar, primary key (id)); insert into foo (name) select random()::varchar from generate_series(1,1000); insert into bar (foo_id, name) select (g%10)+1, random()::varchar from generate_series(1,10000) g; explain select * from foo inner join (select foo_id, sum(1) from bar group by 1) x on foo.id = x.foo_id where x.foo_id = 1; ...Robert
2009/12/20 Robert Haas <robertmhaas@gmail.com>: > On Sat, Dec 19, 2009 at 12:49 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >> 2009/10/20 Andrew Gierth <andrew@tao11.riddles.org.uk>: >>> Right now, the only way pg can plan this is to do a hashjoin or >>> mergejoin of the _entire content of big1 and big2_ and join the >>> result against "small" (again in a hashjoin or mergejoin plan). >>> This becomes excessively slow compared to the "ideal" plan: >>> >>> nested loop >>> seqscan on small >>> nested loop >>> indexscan on big1 where id=small.id >>> indexscan on big2 where id=small.id (or big1.id which is equiv) >>> >>> (The same argument applies if "small" is not actually small but has >>> restriction clauses) >> >> I have a similar issue on my mind, but is this the same as the topic? >> >> SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY >> large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla) >> >> The ideal plan is SeqScan on small with filtering sub query aggregate >> on large by small.id but the actual plan is full aggregate on large >> since the planner doesn't push down outer qual to aggregate node. The >> output will discard almost all of agged's output. > > I just tried this and it works for me. > > create table foo (id serial, name varchar, primary key (id)); > create table bar (id serial, foo_id integer references foo (id), name > varchar, primary key (id)); > insert into foo (name) select random()::varchar from generate_series(1,1000); > insert into bar (foo_id, name) select (g%10)+1, random()::varchar from > generate_series(1,10000) g; > explain select * from foo inner join (select foo_id, sum(1) from bar > group by 1) x on foo.id = x.foo_id where x.foo_id = 1; > > ...Robert > Ah your example works for me, too. My issue is: explain select * from foo inner join (select foo_id, sum(1) from bar group by 1) x on foo.id = x.foo_id where foo.id = 1; where foo.id = 1 (not where x.foo_id = 1). And I now figured out it's another problem. Regards, -- Hitoshi Harada
On Thu, Dec 17, 2009 at 10:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Another question I have - while generalizing the inner-indexscan > machinery is an interesting join optimization technique, I'm thinking > that it actually has very little to do with LATERAL. Is there any > reason to suppose that one or the other needs to be done first? And the winner is... yes. Or at least, I think so. One of the major reasons why people want LATERAL() is for SRFs, but currently, even if you beat the code into allowing a SRF with an outer reference, the planner can easily be persuaded to run the SRF on the outer side of a join with the dependency as the inner side, which ain't gonna work. (Even you jigger the query so that the planner gets them on the correct sides of the join, the executor fails, but that's a different problem.) The idea Tom came up with back in October is to allow paths to be tagged with a set of rels to which they must in the future be joined in order for the path to be allowable. The point of that exercise was to generalize the current inner-indexscan machinery so that we can create that type of plan in match_unsorted_outer() even when the inner side is a joinrel. But, it strikes me that what we need to allow a function scan with an outer reference is remarkably similar - the function scan can only be used as the inner side of a nestloop with a certain set of rels on the outer side. On the other hand, it's not exactly the same, either. In the case of a construct like A LJ (B IJ C), partial-index scan paths for B and C will require a subsequent nest-join to A to become fully valid, but there will also be other paths that don't. But for something like "A, LATERAL (some_srf(A.x))", the ONLY path for the "rel" defined by some_srf(A.x) has a future-join requirement of {A}. It's not clear to me whether there's anything useful that can be done with this knowledge. Incidentally, the reason why the executor chokes trying to execute a SRF with an outer reference is because ExecEvalVar() craps out trying to dereference a null TupleTableSlot. If I'm understanding this correctly, that, in turn, happens because the variable that we're trying to deference is marked as neither INNER nor OUTER, so it's assumed to be from a scan, but there's no scan node. Going even further from my area of actually understanding what's going on, I think this needs to be fixed by adjusting setrefs.c. Allowing LATERAL(), or for that matter the generalized inner-index scan stuff, will I think mean that set_inner_join_references() will need to handle a lot more cases than it current does. I don't understand this code well enough to begin to speculate as to what should happen here. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Incidentally, the reason why the executor chokes trying to execute a > SRF with an outer reference is because ExecEvalVar() craps out trying > to dereference a null TupleTableSlot. If I'm understanding this > correctly, that, in turn, happens because the variable that we're > trying to deference is marked as neither INNER nor OUTER, so it's > assumed to be from a scan, but there's no scan node. Going even > further from my area of actually understanding what's going on, I > think this needs to be fixed by adjusting setrefs.c. Well, no: we can't handle such references as OUTER vars because the OUTER slot is likely to be in use already in the sub-join. It would be even messier if you wanted several references to different outer relations. I believe the correct approach is probably to treat values that need to be propagated into the inner side as executor parameters. This could replace the existing, rather crocky, management of values passed into a nestloop inner indexscan. The mechanisms that deal with forcing rescans of subplans affected by a changed parameter value would be very helpful here too. regards, tom lane
On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Incidentally, the reason why the executor chokes trying to execute a >> SRF with an outer reference is because ExecEvalVar() craps out trying >> to dereference a null TupleTableSlot. If I'm understanding this >> correctly, that, in turn, happens because the variable that we're >> trying to deference is marked as neither INNER nor OUTER, so it's >> assumed to be from a scan, but there's no scan node. Going even >> further from my area of actually understanding what's going on, I >> think this needs to be fixed by adjusting setrefs.c. > > Well, no: we can't handle such references as OUTER vars because the > OUTER slot is likely to be in use already in the sub-join. It would be > even messier if you wanted several references to different outer > relations. Oh. Yeah. > I believe the correct approach is probably to treat values that need to > be propagated into the inner side as executor parameters. This could > replace the existing, rather crocky, management of values passed into a > nestloop inner indexscan. The mechanisms that deal with forcing rescans > of subplans affected by a changed parameter value would be very helpful > here too. What is the best place to look for the existing, rather crocky code? I have to admit that the whole mechanism by which paths get transformed into plans and handed off to the executor is still rather opaque to me. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I believe the correct approach is probably to treat values that need to >> be propagated into the inner side as executor parameters. �This could >> replace the existing, rather crocky, management of values passed into a >> nestloop inner indexscan. > What is the best place to look for the existing, rather crocky code? Follow the second argument of ExecReScan from nodeNestloop to nodeIndexscan. regards, tom lane
On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I believe the correct approach is probably to treat values that need to >>> be propagated into the inner side as executor parameters. This could >>> replace the existing, rather crocky, management of values passed into a >>> nestloop inner indexscan. > >> What is the best place to look for the existing, rather crocky code? > > Follow the second argument of ExecReScan from nodeNestloop to > nodeIndexscan. Yeah, this is grotty. It appears that the comment introducing ExecReScan() is somewhat incorrect. It asserts that exprCtxt is used only
On Sat, Dec 19, 2009 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> I believe the correct approach is probably to treat values that need to >>>> be propagated into the inner side as executor parameters. This could >>>> replace the existing, rather crocky, management of values passed into a >>>> nestloop inner indexscan. >> >>> What is the best place to look for the existing, rather crocky code? >> >> Follow the second argument of ExecReScan from nodeNestloop to >> nodeIndexscan. > > Yeah, this is grotty. It appears that the comment introducing > ExecReScan() is somewhat incorrect. It asserts that exprCtxt is used > only Sigh. ...is used only for index scans. However, it's actually also used for bitmap scans (both heap and index) and TID scans. Also, there appears to be an effort by nodes that don't use exprCtxt directly to propagate down through the node tree, which doesn't seem to make much sense if this is only intended to be used on the inner side of a nestloop. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: >> Yeah, this is grotty. �It appears that the comment introducing >> ExecReScan() is somewhat incorrect. �It asserts that exprCtxt is used >> only > Sigh. > ...is used only for index scans. However, it's actually also used for > bitmap scans (both heap and index) and TID scans. Yeah, the comment was probably correct when written. > Also, there appears > to be an effort by nodes that don't use exprCtxt directly to propagate > down through the node tree, which doesn't seem to make much sense if > this is only intended to be used on the inner side of a nestloop. That's just dead code, which is a good thing because the coverage is pretty incomplete. regards, tom lane
Robert Haas wrote: > On Sat, Dec 19, 2009 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Robert Haas <robertmhaas@gmail.com> writes: > >>> On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>>> I believe the correct approach is probably to treat values that need to > >>>> be propagated into the inner side as executor parameters. ?This could > >>>> replace the existing, rather crocky, management of values passed into a > >>>> nestloop inner indexscan. > >> > >>> What is the best place to look for the existing, rather crocky code? > >> > >> Follow the second argument of ExecReScan from nodeNestloop to > >> nodeIndexscan. > > > > Yeah, this is grotty. ?It appears that the comment introducing > > ExecReScan() is somewhat incorrect. ?It asserts that exprCtxt is used > > only > > Sigh. > > ...is used only for index scans. However, it's actually also used for > bitmap scans (both heap and index) and TID scans. Also, there appears > to be an effort by nodes that don't use exprCtxt directly to propagate > down through the node tree, which doesn't seem to make much sense if > this is only intended to be used on the inner side of a nestloop. Does some comment need to be updated? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +