Обсуждение: Review: UNNEST (and other functions) WITH ORDINALITY
On 17 June 2013 06:33, David Fetter <david@fetter.org> wrote: >> Next revision of the patch, now with more stability. Thanks, Andrew! > > Rebased vs. git master. > Here's my review of the WITH ORDINALITY patch. Overall I think that the patch is in good shape, and I think that this will be a very useful new feature, so I'm keen to see this committed. All the basic stuff is OK --- the patch applies cleanly, compiles with no warnings, and has appropriate docs and new regression tests which pass. I also tested it fairly thoroughly myself, and I couldn't break it. Everything worked as I expected, and it works nicely with LATERAL. I have a few minor comments that should be considered before passing it on to a committer: 1). In func.sgml, the new doc example "unnest WITH ORDINALITY" is mislablled, since it it's not actually an example of unnest(). Also it doesn't really belong in that code example block, which is about generate_subscripts(). I think that there should probably be a new sub-section starting at that point for WITH ORDINALITY. Perhaps it should start with a brief paragraph explaining how WITH ORDINALITY can be applied to functions in the FROM clause of a query. [Actually it appears that WITH ORDINALITY works with non-SRF's too, but that's less useful, so I think that the SRF section is probably still the right place to document this] It might also be worth mentioning here that currently WITH ORDINALITY is not supported for functions that return records. In the code example itself, the prompt should be trimmed down to match the previous examples. 2). In the SELECT docs, where function_name is documented, I would be tempted to start a new paragraph for the sentence starting "If the function has been defined as returning the record data type...", since that's really a separate syntax. I think that should also make mention of the fact that WITH ORDINALITY is not currently supported in that case. 3). I think it would be good to have a more meaningful default name for the new ordinality column, rather than "?column?". In many cases the user might then choose to not bother giving it an alias. It could simply be called ordinality by default, since that's non-reserved. 4). In gram.y, WITH_ORDINALITY should not be listed as an ordinary keyword, but instead should be listed as a token below that (just before WITH_TIME). 5). In plannodes.h, FunctionScan's new field should probably have a comment. 6). In parsenodes.h, the field added to RangeTblEntry is only valid for function RTEs, so it should be moved to that group of fields and renamed appropriately (unless you're expecting to extend it to other RTE kinds in the future?). Logically then, the new check for ordinality in inline_set_returning_function() should be moved so that it is after the check that the RTE actually a function RTE, and in addRangeTableEntryForFunction() the RTE's ordinality field should be set at the start along with the other function-related fields. 7). In execnodes.h, the new fields added to FunctionScanState should be documented in the comment block above. Overall, as I said, I really like this feature, and I think it's not far from being commitable. Regards, Dean
On Tue, Jun 18, 2013 at 11:36:08AM +0100, Dean Rasheed wrote: > On 17 June 2013 06:33, David Fetter <david@fetter.org> wrote: > >> Next revision of the patch, now with more stability. Thanks, Andrew! > > > > Rebased vs. git master. > > Here's my review of the WITH ORDINALITY patch. > > Overall I think that the patch is in good shape, and I think that this > will be a very useful new feature, so I'm keen to see this committed. > > All the basic stuff is OK --- the patch applies cleanly, compiles with > no warnings, and has appropriate docs and new regression tests which > pass. I also tested it fairly thoroughly myself, and I couldn't break > it. Everything worked as I expected, and it works nicely with LATERAL. > > I have a few minor comments that should be considered before passing > it on to a committer: > > > 1). In func.sgml, the new doc example "unnest WITH ORDINALITY" is > mislablled, since it it's not actually an example of unnest(). Fixed in patch attached. > Also it doesn't really belong in that code example block, which is > about generate_subscripts(). I think that there should probably be a > new sub-section starting at that point for WITH ORDINALITY. Perhaps > it should start with a brief paragraph explaining how WITH > ORDINALITY can be applied to functions in the FROM clause of a > query. How's the attached? > [Actually it appears that WITH ORDINALITY works with non-SRF's too, > but that's less useful, so I think that the SRF section is probably > still the right place to document this] As of this patch, that's now both in the SELECT docs and the SRF section. > It might also be worth mentioning here that currently WITH ORDINALITY > is not supported for functions that return records. Added. > In the code example itself, the prompt should be trimmed down to match > the previous examples. Done. > 2). In the SELECT docs, where function_name is documented, I would be > tempted to start a new paragraph for the sentence starting "If the > function has been defined as returning the record data type...", since > that's really a separate syntax. I think that should also make mention > of the fact that WITH ORDINALITY is not currently supported in that > case. Done-ish. What do you think? > 3). I think it would be good to have a more meaningful default name > for the new ordinality column, rather than "?column?". In many cases > the user might then choose to not bother giving it an alias. It could > simply be called ordinality by default, since that's non-reserved. I don't think this needs doing, per spec. The column name needs to be unique, and if someone happens to name an output column of a function, "?column?", that's really not our problem. > 4). In gram.y, WITH_ORDINALITY should not be listed as an ordinary > keyword, but instead should be listed as a token below that (just > before WITH_TIME). > Done. > 5). In plannodes.h, FunctionScan's new field should probably have a comment. Done. > 6). In parsenodes.h, the field added to RangeTblEntry is only valid > for function RTEs, so it should be moved to that group of fields and > renamed appropriately (unless you're expecting to extend it to other > RTE kinds in the future?). Nope, and done. > Logically then, the new check for ordinality in > inline_set_returning_function() should be moved so that it is after > the check that the RTE actually a function RTE, and in > addRangeTableEntryForFunction() the RTE's ordinality field should be > set at the start along with the other function-related fields. We don't actually get to inline_set_returning_function unless it's a function RTE. > 7). In execnodes.h, the new fields added to FunctionScanState should > be documented in the comment block above. Done. > Overall, as I said, I really like this feature, and I think it's not > far from being commitable. Great! :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Вложения
On 19 June 2013 04:11, David Fetter <david@fetter.org> wrote: > On Tue, Jun 18, 2013 at 11:36:08AM +0100, Dean Rasheed wrote: >> On 17 June 2013 06:33, David Fetter <david@fetter.org> wrote: >> >> Next revision of the patch, now with more stability. Thanks, Andrew! >> > >> > Rebased vs. git master. >> >> Here's my review of the WITH ORDINALITY patch. >> >> Overall I think that the patch is in good shape, and I think that this >> will be a very useful new feature, so I'm keen to see this committed. >> >> All the basic stuff is OK --- the patch applies cleanly, compiles with >> no warnings, and has appropriate docs and new regression tests which >> pass. I also tested it fairly thoroughly myself, and I couldn't break >> it. Everything worked as I expected, and it works nicely with LATERAL. >> >> I have a few minor comments that should be considered before passing >> it on to a committer: >> >> >> 1). In func.sgml, the new doc example "unnest WITH ORDINALITY" is >> mislablled, since it it's not actually an example of unnest(). > > Fixed in patch attached. > >> Also it doesn't really belong in that code example block, which is >> about generate_subscripts(). I think that there should probably be a >> new sub-section starting at that point for WITH ORDINALITY. Perhaps >> it should start with a brief paragraph explaining how WITH >> ORDINALITY can be applied to functions in the FROM clause of a >> query. > > How's the attached? > Looks good. >> [Actually it appears that WITH ORDINALITY works with non-SRF's too, >> but that's less useful, so I think that the SRF section is probably >> still the right place to document this] > > As of this patch, that's now both in the SELECT docs and the SRF > section. > >> It might also be worth mentioning here that currently WITH ORDINALITY >> is not supported for functions that return records. > > Added. > >> In the code example itself, the prompt should be trimmed down to match >> the previous examples. > > Done. > Oh, on closer inspection, the previous examples mostly don't show the prompt at all, except for the last one. So perhaps it should be removed from both those places. >> 2). In the SELECT docs, where function_name is documented, I would be >> tempted to start a new paragraph for the sentence starting "If the >> function has been defined as returning the record data type...", since >> that's really a separate syntax. I think that should also make mention >> of the fact that WITH ORDINALITY is not currently supported in that >> case. > > Done-ish. What do you think? > Hmm, I fear that might have made it worse, because now it reads as if functions that return records can't be used in the FROM clause at all (at least if you don't read all the way to the end, which many people don't). I think if you just take out this change: Function calls can appear in the <literal>FROM</literal> clause. (This is especially useful for functionsthat return - result sets, but any function can be used.) This acts as + result sets, but any function except those that return + <literal>[SETOF] RECORD</literal> can be used.) This acts as then what's left is OK. >> 3). I think it would be good to have a more meaningful default name >> for the new ordinality column, rather than "?column?". In many cases >> the user might then choose to not bother giving it an alias. It could >> simply be called ordinality by default, since that's non-reserved. > > I don't think this needs doing, per spec. The column name needs to be > unique, and if someone happens to name an output column of a function, > "?column?", that's really not our problem. > I don't think the spec says anything about how the new column should be named, so it's up to us, but I do think a sensible default would be useful to save the user from having to give it an alias in many common cases. For example "SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS file" would then produce a column that could be referred to in the rest of the query as file.ordinality or simply ordinality. As it stands, they'd have to write file."?column?", which is really ugly, so we're effectively forcing them to supply an alias for this column every time. I think it would be better if they only had to supply a name to resolve name conflicts, or if they wanted a different name. >> 4). In gram.y, WITH_ORDINALITY should not be listed as an ordinary >> keyword, but instead should be listed as a token below that (just >> before WITH_TIME). >> > > Done. > >> 5). In plannodes.h, FunctionScan's new field should probably have a comment. > > Done. > >> 6). In parsenodes.h, the field added to RangeTblEntry is only valid >> for function RTEs, so it should be moved to that group of fields and >> renamed appropriately (unless you're expecting to extend it to other >> RTE kinds in the future?). > > Nope, and done. > >> Logically then, the new check for ordinality in >> inline_set_returning_function() should be moved so that it is after >> the check that the RTE actually a function RTE, and in >> addRangeTableEntryForFunction() the RTE's ordinality field should be >> set at the start along with the other function-related fields. > > We don't actually get to inline_set_returning_function unless it's a > function RTE. > OK, yes you're right. >> 7). In execnodes.h, the new fields added to FunctionScanState should >> be documented in the comment block above. > > Done. > There's still a comment relating the old tupdesc field in the comment block above. I think for consistency with the surrounding code, all those comments should be in header comment block (except for the NodeTag one). Everything else looks good. I think we're now down to just a few minor comment/doc issues, and the question of whether the new column should have a better default name. Regards, Dean
On Wed, Jun 19, 2013 at 01:03:42PM +0100, Dean Rasheed wrote: > On 19 June 2013 04:11, David Fetter <david@fetter.org> wrote: > > On Tue, Jun 18, 2013 at 11:36:08AM +0100, Dean Rasheed wrote: > >> On 17 June 2013 06:33, David Fetter <david@fetter.org> wrote: > >> >> Next revision of the patch, now with more stability. Thanks, Andrew! > >> > > >> > Rebased vs. git master. > >> > >> Here's my review of the WITH ORDINALITY patch. > >> > >> Overall I think that the patch is in good shape, and I think that this > >> will be a very useful new feature, so I'm keen to see this committed. > >> > >> All the basic stuff is OK --- the patch applies cleanly, compiles with > >> no warnings, and has appropriate docs and new regression tests which > >> pass. I also tested it fairly thoroughly myself, and I couldn't break > >> it. Everything worked as I expected, and it works nicely with LATERAL. > >> > >> I have a few minor comments that should be considered before passing > >> it on to a committer: > >> > >> > >> 1). In func.sgml, the new doc example "unnest WITH ORDINALITY" is > >> mislablled, since it it's not actually an example of unnest(). > > > > Fixed in patch attached. > > > >> Also it doesn't really belong in that code example block, which is > >> about generate_subscripts(). I think that there should probably be a > >> new sub-section starting at that point for WITH ORDINALITY. Perhaps > >> it should start with a brief paragraph explaining how WITH > >> ORDINALITY can be applied to functions in the FROM clause of a > >> query. > > > > How's the attached? > > > > Looks good. Thanks! > >> In the code example itself, the prompt should be trimmed down to match > >> the previous examples. > > > > Done. > > > > Oh, on closer inspection, the previous examples mostly don't show the > prompt at all, except for the last one. So perhaps it should be > removed from both those places. Done. > >> 2). In the SELECT docs, where function_name is documented, I would be > >> tempted to start a new paragraph for the sentence starting "If the > >> function has been defined as returning the record data type...", since > >> that's really a separate syntax. I think that should also make mention > >> of the fact that WITH ORDINALITY is not currently supported in that > >> case. > > > > Done-ish. What do you think? > > > > Hmm, I fear that might have made it worse, because now it reads as if > functions that return records can't be used in the FROM clause at all > (at least if you don't read all the way to the end, which many people > don't). I think if you just take out this change: > > Function calls can appear in the <literal>FROM</literal> > clause. (This is especially useful for functions that return > - result sets, but any function can be used.) This acts as > + result sets, but any function except those that return > + <literal>[SETOF] RECORD</literal> can be used.) This acts as > > then what's left is OK. Done. > >> 3). I think it would be good to have a more meaningful default name > >> for the new ordinality column, rather than "?column?". In many cases > >> the user might then choose to not bother giving it an alias. It could > >> simply be called ordinality by default, since that's non-reserved. > > > > I don't think this needs doing, per spec. The column name needs to be > > unique, and if someone happens to name an output column of a function, > > "?column?", that's really not our problem. > > > > I don't think the spec says anything about how the new column should > be named, so it's up to us, but I do think a sensible default would be > useful to save the user from having to give it an alias in many common > cases. > > For example "SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS file" The spec is pretty specific about the "all or none" nature of naming in the AS clause...unless we can figure out a way of getting around it somehow. > >> 7). In execnodes.h, the new fields added to FunctionScanState should > >> be documented in the comment block above. > > > > Done. > > > > There's still a comment relating the old tupdesc field in the comment > block above. I think for consistency with the surrounding code, all > those comments should be in header comment block (except for the > NodeTag one). Done. > Everything else looks good. I think we're now down to just a few minor > comment/doc issues, and the question of whether the new column should > have a better default name. I'm weighing in on the side of a name that's like ?columnN? elsewhere in the code, i.e. one that couldn't sanely be an actual column name, whether in table, view, or SRF. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Вложения
On 21 June 2013 06:54, David Fetter <david@fetter.org> wrote: >> For example "SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS file" > > The spec is pretty specific about the "all or none" nature of naming > in the AS clause...unless we can figure out a way of getting around it > somehow. We already support and document the ability to provide a subset of the columns in the alias. I didn't realise that was beyond the spec, but since we already have it... > I'm weighing in on the side of a name that's like ?columnN? elsewhere > in the code, i.e. one that couldn't sanely be an actual column name, > whether in table, view, or SRF. I don't think we need to be overly concerned with coming up with a unique name for the column. Duplicate column names are fine, except if the user wants to refer to them elsewhere in the query, in which case they need to provide aliases to distinguish them, otherwise the query won't be accepted. I'd be happy if you just replaced "?column?" with ordinality in a couple of places in your original patch. Regards, Dean
On 21 June 2013 08:02, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 21 June 2013 06:54, David Fetter <david@fetter.org> wrote: >>> For example "SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS file" >> >> The spec is pretty specific about the "all or none" nature of naming >> in the AS clause...unless we can figure out a way of getting around it >> somehow. > > We already support and document the ability to provide a subset of the > columns in the alias. I didn't realise that was beyond the spec, but > since we already have it... > > >> I'm weighing in on the side of a name that's like ?columnN? elsewhere >> in the code, i.e. one that couldn't sanely be an actual column name, >> whether in table, view, or SRF. > > I don't think we need to be overly concerned with coming up with a > unique name for the column. Duplicate column names are fine, except if > the user wants to refer to them elsewhere in the query, in which case > they need to provide aliases to distinguish them, otherwise the query > won't be accepted. > > I'd be happy if you just replaced "?column?" with ordinality in a > couple of places in your original patch. > Expanding on that, I think it's perfectly acceptable to allow potentially duplicate column names in this context. For the majority of simple queries the user wouldn't have to (and wouldn't feel compelled to) supply aliases. Where there was ambiguity they would be forced to do so, but people are already used to that. If I write a simple query that selects from a single unnest() with or without ordinality, I probably won't want to supply aliases. If I select from 2 unnest()'s *without* ordinality, I already have to provide aliases. If I select from 2 SRF's functions with ordinality, I won't be too surprised if I am also forced to provide aliases. Regards, Dean
Naming of ORDINALITY column (was: Re: Review: UNNEST (and other functions) WITH ORDINALITY)
От
Andrew Gierth
Дата:
OK, let's try to cover all the bases here in one go. First, the spec definition of WITH ORDINALITY simply says that the column name in the result is not equivalent to any other identifier in the same <table primary> (including the <correlation name>). It is clear that the intention of the spec is that any non-positional reference to this column (which is defined as being positionally last) requires an alias at some point, whether directly attached to the <table primary> or at an outer level. Second, all the documentation I've looked at for other databases that implement this feature (such as DB2, Teradata, etc.) takes it for granted that the user must always supply an alias, even though the syntax does not actually require one. None of the ones I've seen suggest that the ordinality column has a useful or consistent name if no alias is supplied. So, while clearly there's nothing stopping us from going beyond the spec and using a column name that people can refer to without needing an alias, it would be a significant divergence from common practice in other dbs. (iirc, it was my suggestion to David to use "?column?" in the first place for this reason.) So as I see it the options are: 1. Stick with "?column?" as a warning flag that you're not supposed to be using this without aliasing it to something. 2. Use some other fixed name like "ordinality" simply to allow people to do things like select ... from unnest(x) with ordinality; without having to bother to provide an alias, simply as a convenience, without regard for consistency with others. (This will result in a duplicate name if "x" is of a composite type containing a column called "ordinality", so the caller will have to provide an alias in that specific case or get an ambiguous reference error. Similarly if using some other SRF which defines its own return column names.) 3. Generate an actually unique name (probably pointless) 4. Something else I haven't thought of. My vote remains with option 1 here; I don't think users should be encouraged to assume that the ordinality column will have a known name. -- Andrew (irc:RhodiumToad)
On 06/23/2013 08:00 PM, Andrew Gierth wrote: > OK, let's try to cover all the bases here in one go. > 1. Stick with "?column?" as a warning flag that you're not supposed to > be using this without aliasing it to something. How do I actually supply an alias which covers both columns? What does that look like, syntactically? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 21 June 2013 08:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 21 June 2013 08:02, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> On 21 June 2013 06:54, David Fetter <david@fetter.org> wrote: >>>> For example "SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS file" >>> >>> The spec is pretty specific about the "all or none" nature of naming >>> in the AS clause...unless we can figure out a way of getting around it >>> somehow. >> >> We already support and document the ability to provide a subset of the >> columns in the alias. I didn't realise that was beyond the spec, but >> since we already have it... >> >> >>> I'm weighing in on the side of a name that's like ?columnN? elsewhere >>> in the code, i.e. one that couldn't sanely be an actual column name, >>> whether in table, view, or SRF. >> >> I don't think we need to be overly concerned with coming up with a >> unique name for the column. Duplicate column names are fine, except if >> the user wants to refer to them elsewhere in the query, in which case >> they need to provide aliases to distinguish them, otherwise the query >> won't be accepted. >> >> I'd be happy if you just replaced "?column?" with ordinality in a >> couple of places in your original patch. >> > > Expanding on that, I think it's perfectly acceptable to allow > potentially duplicate column names in this context. For the majority > of simple queries the user wouldn't have to (and wouldn't feel > compelled to) supply aliases. Where there was ambiguity they would be > forced to do so, but people are already used to that. > > If I write a simple query that selects from a single unnest() with or > without ordinality, I probably won't want to supply aliases. > > If I select from 2 unnest()'s *without* ordinality, I already have to > provide aliases. > > If I select from 2 SRF's functions with ordinality, I won't be too > surprised if I am also forced to provide aliases. > No one else has expressed an opinion about the naming of the new column. All other review comments have been addressed, and the patch looks to be in good shape, so I'm marking this as ready for committer. IMO it's a very useful piece of new functionality. Good job! Regards, Dean
On 24 June 2013 04:29, Josh Berkus <josh@agliodbs.com> wrote: > On 06/23/2013 08:00 PM, Andrew Gierth wrote: >> OK, let's try to cover all the bases here in one go. > >> 1. Stick with "?column?" as a warning flag that you're not supposed to >> be using this without aliasing it to something. > > How do I actually supply an alias which covers both columns? What does > that look like, syntactically? > There are a number of possible syntaxes: SELECT unnest, "?column?" FROM unnest(ARRAY['x','y']) WITH ORDINALITY; or SELECT unnest.unnest, unnest."?column?" FROM unnest(ARRAY['x','y']) WITH ORDINALITY;unnest | ?column? --------+----------x | 1y | 2 (2 rows) SELECT t, "?column?" FROM unnest(ARRAY['x','y']) WITH ORDINALITY AS t; or SELECT t.t, t."?column?" FROM unnest(ARRAY['x','y']) WITH ORDINALITY AS t;t | ?column? ---+----------x | 1y | 2 (2 rows) SELECT val, "?column?" FROM unnest(ARRAY['x','y']) WITH ORDINALITY AS t(val); or SELECT t.val, t."?column?" FROM unnest(ARRAY['x','y']) WITH ORDINALITY AS t(val);val | ?column? -----+----------x | 1y | 2 (2 rows) SELECT val, ord FROM unnest(ARRAY['x','y']) WITH ORDINALITY AS t(val, ord); or SELECT t.val, t.ord FROM unnest(ARRAY['x','y']) WITH ORDINALITY AS t(val, ord);val | ord -----+-----x | 1y | 2 (2 rows) My suggestion was to replace "?column?" with ordinality wherever it appears above, for the user's convenience, but so far more people prefer "?column?" as a way of indicating that you're supposed to provide an alias for the column. If that's what people prefer, I don't mind --- it's still going to be a very handy new feature. Regards, Dean
Folks, (the below was already discussed on IRC) Leaving names aside on this patch, I'm wondering about a piece of functionality I have with the current unnest() and with the unnest_ordinality()[1] extension: namely, the ability to unnest several arrays "in parallel" by using unnest() in the target list. For example, given the table: lotsarrays (id serial PK,arr1 int[],arr2 numeric[],arr3 boolean[] ) I can currently do: SELECT id,unnest(arr1) as arr1,unnest(arr2) as arr2,unnest(arr3) as arr3 FROM lotsarrays; ... and if arr1, 2 and 3 are exactly the same length, this creates a coordinated dataset. I can even use the unnest_ordinality() extension function to get the ordinality of this combined dataset: SELECT id,(unnest_ordinality(arr1)).element_number as array_index,unnest(arr1) as arr1,unnest(arr2) as arr2,unnest(arr3)as arr3 FROM lotsarrays; There are reasons why this will be complicated to implement WITH ORDINALITY; DF, Andrew and I discussed them on IRC. So allowing WITH ORDINALITY in the target list is a TODO, either for later in 9.4 development, or for 9.5. So, this isn't stopping the patch; I just want a TODO for "implement WITH ORDINALITY in the target list for SRFs". -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > ... and if arr1, 2 and 3 are exactly the same length, this creates a > coordinated dataset. I can even use the unnest_ordinality() extension > function to get the ordinality of this combined dataset: > SELECT id, > (unnest_ordinality(arr1)).element_number as array_index, > unnest(arr1) as arr1, > unnest(arr2) as arr2, > unnest(arr3) as arr3 > FROM lotsarrays; > There are reasons why this will be complicated to implement WITH > ORDINALITY; DF, Andrew and I discussed them on IRC. So allowing WITH > ORDINALITY in the target list is a TODO, either for later in 9.4 > development, or for 9.5. Some of the rest of us would like to hear those reasons, because my immediate reaction is that the patch must be broken by design. WITH ORDINALITY should not be needing to mess with the fundamental evaluation semantics of SRFs, but it sure sounds like it is doing so if that case doesn't work as expected. regards, tom lane
On 26 June 2013 01:22, Josh Berkus <josh@agliodbs.com> wrote: > Folks, > > (the below was already discussed on IRC) > > Leaving names aside on this patch, I'm wondering about a piece of > functionality I have with the current unnest() and with the > unnest_ordinality()[1] extension: namely, the ability to unnest several > arrays "in parallel" by using unnest() in the target list. > > For example, given the table: > > lotsarrays ( > id serial PK, > arr1 int[], > arr2 numeric[], > arr3 boolean[] > ) > > I can currently do: > > SELECT id, > unnest(arr1) as arr1, > unnest(arr2) as arr2, > unnest(arr3) as arr3 > FROM lotsarrays; > > ... and if arr1, 2 and 3 are exactly the same length, this creates a > coordinated dataset. I can even use the unnest_ordinality() extension > function to get the ordinality of this combined dataset: > > SELECT id, > (unnest_ordinality(arr1)).element_number as array_index, > unnest(arr1) as arr1, > unnest(arr2) as arr2, > unnest(arr3) as arr3 > FROM lotsarrays; > > There are reasons why this will be complicated to implement WITH > ORDINALITY; DF, Andrew and I discussed them on IRC. So allowing WITH > ORDINALITY in the target list is a TODO, either for later in 9.4 > development, or for 9.5. > > So, this isn't stopping the patch; I just want a TODO for "implement > WITH ORDINALITY in the target list for SRFs". > So if I'm understanding correctly, your issue is that WITH ORDINALITY is currently only accepted on SRFs in the FROM list, not that it isn't working as expected in any way. I have no objection to adding a TODO item to extend it, but note that the restriction is trivial to work around: CREATE TABLE lotsarrays ( id serial primary key, arr1 int[], arr2 numeric[], arr3 boolean[] ); INSERT INTO lotsarrays(arr1, arr2, arr3) VALUES (ARRAY[1,2], ARRAY[1.1, 2.2], ARRAY[true, false]), (ARRAY[10,20,30], ARRAY[10.1,20.2, 30.3], ARRAY[true, false, true]); CREATE OR REPLACE FUNCTION unnest_ordinality(anyarray) RETURNS TABLE(element_number bigint, element anyelement) AS $$ SELECT ord, elt FROM unnest($1) WITH ORDINALITY AS t(elt, ord) $$ LANGUAGE sql STRICT IMMUTABLE; SELECT id, (unnest_ordinality(arr1)).element_number as array_index, unnest(arr1) as arr1, unnest(arr2)as arr2, unnest(arr3) as arr3 FROM lotsarrays;id | array_index | arr1 | arr2 | arr3 ----+-------------+------+------+------ 1 | 1 | 1 | 1.1 | t 1 | 2 | 2 | 2.2 | f 2 | 1 | 10 | 10.1 | t 2 | 2 | 20 | 20.2 | f 2 | 3 | 30 | 30.3 | t (5 rows) Personally I'm not a fan of SRFs in the select list, especially not multiple SRFs there, since the results are hard to deal with if they return differing numbers of rows. So I would tend to write this as a LATERAL FULL join on the ordinality columns: SELECT id, COALESCE(u1.ord, u2.ord, u3.ord) AS array_index, u1.arr1, u2.arr2, u3.arr3 FROM lotsarrays, unnest(arr1)WITH ORDINALITY AS u1(arr1, ord) FULL JOIN unnest(arr2) WITH ORDINALITY AS u2(arr2, ord) ON u2.ord = u1.ord FULLJOIN unnest(arr3) WITH ORDINALITY AS u3(arr3, ord) ON u3.ord = COALESCE(u1.ord, u2.ord); Either way, I think the WITH ORDINALITY patch is working as expected. Regards, Dean
On Mon, Jun 24, 2013 at 03:04:04PM +0100, Dean Rasheed wrote: > On 21 June 2013 08:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On 21 June 2013 08:02, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > >> On 21 June 2013 06:54, David Fetter <david@fetter.org> wrote: > >>>> For example "SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS file" > >>> > >>> The spec is pretty specific about the "all or none" nature of naming > >>> in the AS clause...unless we can figure out a way of getting around it > >>> somehow. > >> > >> We already support and document the ability to provide a subset of the > >> columns in the alias. I didn't realise that was beyond the spec, but > >> since we already have it... > >> > >> > >>> I'm weighing in on the side of a name that's like ?columnN? elsewhere > >>> in the code, i.e. one that couldn't sanely be an actual column name, > >>> whether in table, view, or SRF. > >> > >> I don't think we need to be overly concerned with coming up with a > >> unique name for the column. Duplicate column names are fine, except if > >> the user wants to refer to them elsewhere in the query, in which case > >> they need to provide aliases to distinguish them, otherwise the query > >> won't be accepted. > >> > >> I'd be happy if you just replaced "?column?" with ordinality in a > >> couple of places in your original patch. > >> > > > > Expanding on that, I think it's perfectly acceptable to allow > > potentially duplicate column names in this context. For the majority > > of simple queries the user wouldn't have to (and wouldn't feel > > compelled to) supply aliases. Where there was ambiguity they would be > > forced to do so, but people are already used to that. > > > > If I write a simple query that selects from a single unnest() with or > > without ordinality, I probably won't want to supply aliases. > > > > If I select from 2 unnest()'s *without* ordinality, I already have to > > provide aliases. > > > > If I select from 2 SRF's functions with ordinality, I won't be too > > surprised if I am also forced to provide aliases. > > > > No one else has expressed an opinion about the naming of the new > column. All other review comments have been addressed, and the patch > looks to be in good shape, so I'm marking this as ready for committer. > > IMO it's a very useful piece of new functionality. Good job! Thanks! Please find attach another rev of the patch which reflects the de-reserving of OVER. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Вложения
On Sun, Jun 30, 2013 at 06:00:35PM -0700, David Fetter wrote: > On Mon, Jun 24, 2013 at 03:04:04PM +0100, Dean Rasheed wrote: > > On 21 June 2013 08:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > On 21 June 2013 08:02, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > >> On 21 June 2013 06:54, David Fetter <david@fetter.org> wrote: > > >>>> For example "SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS file" > > >>> > > >>> The spec is pretty specific about the "all or none" nature of naming > > >>> in the AS clause...unless we can figure out a way of getting around it > > >>> somehow. > > >> > > >> We already support and document the ability to provide a subset of the > > >> columns in the alias. I didn't realise that was beyond the spec, but > > >> since we already have it... > > >> > > >> > > >>> I'm weighing in on the side of a name that's like ?columnN? elsewhere > > >>> in the code, i.e. one that couldn't sanely be an actual column name, > > >>> whether in table, view, or SRF. > > >> > > >> I don't think we need to be overly concerned with coming up with a > > >> unique name for the column. Duplicate column names are fine, except if > > >> the user wants to refer to them elsewhere in the query, in which case > > >> they need to provide aliases to distinguish them, otherwise the query > > >> won't be accepted. > > >> > > >> I'd be happy if you just replaced "?column?" with ordinality in a > > >> couple of places in your original patch. > > >> > > > > > > Expanding on that, I think it's perfectly acceptable to allow > > > potentially duplicate column names in this context. For the majority > > > of simple queries the user wouldn't have to (and wouldn't feel > > > compelled to) supply aliases. Where there was ambiguity they would be > > > forced to do so, but people are already used to that. > > > > > > If I write a simple query that selects from a single unnest() with or > > > without ordinality, I probably won't want to supply aliases. > > > > > > If I select from 2 unnest()'s *without* ordinality, I already have to > > > provide aliases. > > > > > > If I select from 2 SRF's functions with ordinality, I won't be too > > > surprised if I am also forced to provide aliases. > > > > > > > No one else has expressed an opinion about the naming of the new > > column. All other review comments have been addressed, and the patch > > looks to be in good shape, so I'm marking this as ready for committer. > > > > IMO it's a very useful piece of new functionality. Good job! > > Thanks! > > Please find attach another rev of the patch which reflects the > de-reserving of OVER. Patch re-jiggered for recent changes to master. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Вложения
On 4 July 2013 00:08, David Fetter <david@fetter.org> wrote: > Patch re-jiggered for recent changes to master. > I re-validated this, and it all still looks good, so still ready for committer IMO. Regards, Dean
On 07/04/2013 10:15 AM, Dean Rasheed wrote: > On 4 July 2013 00:08, David Fetter <david@fetter.org> wrote: >> Patch re-jiggered for recent changes to master. >> > I re-validated this, and it all still looks good, so still ready for > committer IMO. I tried to check this out, too and "make check" fails with the following. I have not looked into the cause. $ cat /home/vik/postgresql/git/src/test/regress/log/initdb.log Running in noclean mode. Mistakes will not be cleaned up. The files belonging to this database system will be owned by user "vik". This user must also own the server process. The database cluster will be initialized with locales COLLATE: en_US.UTF-8 CTYPE: en_US.UTF-8 MESSAGES: C MONETARY: en_US.UTF-8NUMERIC: en_US.UTF-8 TIME: en_US.UTF-8 The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory /home/vik/postgresql/git/src/test/regress/./tmp_check/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB creating configuration files ... ok creating template1 database in /home/vik/postgresql/git/src/test/regress/./tmp_check/data/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... FATAL: role with OID 256 does not exist STATEMENT: DELETE FROM pg_depend; child process exited with exit code 1 initdb: data directory "/home/vik/postgresql/git/src/test/regress/./tmp_check/data" not removed at user's request
On 07/05/2013 04:51 PM, Vik Fearing wrote: > I tried to check this out, too and "make check" fails with the > following. I have not looked into the cause. Running ./configure again fixed it. Sorry for the noise.
On Fri, Jul 05, 2013 at 04:58:30PM +0200, Vik Fearing wrote: > On 07/05/2013 04:51 PM, Vik Fearing wrote: > > I tried to check this out, too and "make check" fails with the > > following. I have not looked into the cause. > > Running ./configure again fixed it. Sorry for the noise. If I had a nickel for every apparent failure of this nature, I'd never need to work again. Thanks for checking :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, Jun 26, 2013 at 2:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Some of the rest of us would like to hear those reasons, because my > immediate reaction is that the patch must be broken by design. WITH > ORDINALITY should not be needing to mess with the fundamental evaluation > semantics of SRFs, but it sure sounds like it is doing so if that case > doesn't work as expected. As Dan pointed out later the patch is not affecting whether this case works as expected. Only that since you can only use WITH ORDINALITY on SRFs in the FROM list and not in the target list if you want to use it you have to rewrite this query to put the SRFs in the FROM list. I think we're ok with that since SRFs in the target list are something that already work kind of strangely and probably we would rather get rid of rather than work to extend. It would be hard to work ordinality into the grammar in the middle of the target list let alone figure out how to implement it. My only reservation with this patch is whether the WITH_ORDINALITY parser hack is the way we want to go. The precedent was already set with WITH TIME ZONE though and I think this was the best option. The worst failure I can come up with is this which doesn't seem like a huge problem: postgres=# with o as (select 1 ) select * from o;?column? ---------- 1 (1 row) postgres=# with ordinality as (select 1 ) select * from ordinality; ERROR: syntax error at or near "ordinality" LINE 1: with ordinality as (select 1 ) select * from ordinality; ^ -- greg
So the more I look at this patch the fewer things I want to change in it. I've several times thought I should make an improvement and then realized I was really just making unnecessary tweaks that didn't really make much difference. It seems a shame that the node has to call the function and get back a slot only to laboriously copy over all the attributes into a new slot. Worse, it's actually storing all the original tuples in a tuplestore without the ordinality and adding in the ordinality on output. This works because the FunctionScan only supports rescan, not mark/restore so it can easily recalculate them consistently if the tuplestore is rescanned. I looked into what it would take to get the ordinality added on the original scan and it would have to go so deep that it doesn't really seem worthwhile. I do find the logic and variable names a bit confusing so I'm tempted to add some comments based on my initial confusion. And I'm tempted to add an ordinalityAttNum field to the executor node so we don't need to make these odd "scanslot means this unless we have ordinality in which case it means that and funcslot means this" logic. That has the side benefit that if the executor node ever wants to add more attributes it won't have this assumption that the last column is the ordinality baked in. I think it'll make the code a bit clearer. Also, I really think the test cases are excessive. They're mostly repeatedly testing the same functionality over and over in cases that are superficially different but I'm fairly certain just invoke the same codepaths. This will just be an annoyance if we make later changes that require adjusting the output. Notably the test that covers the view defintiion appears in pg_views scares me. We bash around the formatting rules for view definition outputs pretty regularly. On the other hand it is nice to have regression tests that make sure these cases are covered. There's been more than one bug in the past caused by omitting updating these functions. I'm leaning towards leaving it in but in the long run we probably need a better solution for this. Oh, and I'm definitely leaning towards naming the column "ordinality". Given we name columns things like "generate_series" and "sum" etc there doesn't seem to be good reason to avoid doing that here. All in all though I feel like I'm looking for trouble. It's not a very complex patch and is definitely basically correct. Who should get credit for it? David? Andrew? And who reviewed it? Dean? Anyone else?
Greg Stark <stark@mit.edu> writes: > I do find the logic and variable names a bit confusing so I'm tempted > to add some comments based on my initial confusion. And I'm tempted to > add an ordinalityAttNum field to the executor node so we don't need to > make these odd "scanslot means this unless we have ordinality in which > case it means that and funcslot means this" logic. I haven't read this patch, but that comment scares the heck out of me. Even if the patch isn't broken today, it will be tomorrow, if it's making random changes like that in data structure semantics. Also, if you're confused, so will be everybody else who has to deal with the code later --- so I don't think fixing the comments and variable names is optional. regards, tom lane
On Mon, Jul 22, 2013 at 4:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I haven't read this patch, but that comment scares the heck out of me. > Even if the patch isn't broken today, it will be tomorrow, if it's > making random changes like that in data structure semantics. It's not making random changes. It's just that it has two code paths, in one it has the function scan write directly to the scan slot and in the other it has the function write to a different slot and copies the fields over to the scan slot. It's actually doing the right thing it's just hard to tell that at first (imho) because it's using the scan slots to determine which case applies instead of having a flag something to drive the decision. > Also, if you're confused, so will be everybody else who has to deal with > the code later --- so I don't think fixing the comments and variable > names is optional. Well that's the main benefit of having someone review the code in my opinion. I'm no smarter than David or Andrew who wrote the code and there's no guarantee I'll spot any bugs. But at least I can observe myself and see where I have difficulty following the logic which the original author is inherently not in a position to do. Honestly this is pretty straightforward and well written so I'm just being especially careful not having committed anything recently. -- greg
Greg Stark said: > So the more I look at this patch the fewer things I want to change in > it. I've several times thought I should make an improvement and then > realized I was really just making unnecessary tweaks that didn't > really make much difference. It's almost as though we actually thought about these things when writing the patch... > I looked into what it would take to get the ordinality added on the > original scan and it would have to go so deep that it doesn't really > seem worthwhile. A design goal was that no SRF implementation should be affected by the change, since there are dozens of C-language SRFs in contrib and third- party modules. The existence of materialize mode prevents us from changing the structure of the tuplestore, since we're not the ones allocating it. The only other approach that seemed possible was to have the tuplestore code itself add an ordinality, which it would have to do unconditionally since for materialize-mode SRFs it would have no way to know if one was required. Doing it in FunctionScan when projecting out tuples seemed much, much cleaner. > I do find the logic and variable names a bit confusing so I'm tempted > to add some comments based on my initial confusion. And I'm tempted to > add an ordinalityAttNum field to the executor node so we don't need to > make these odd "scanslot means this unless we have ordinality in which > case it means that and funcslot means this" logic. That has the side > benefit that if the executor node ever wants to add more attributes it > won't have this assumption that the last column is the ordinality > baked in. I think it'll make the code a bit clearer. I admit the (one single) use of checking func_slot for nullity rather than checking the funcordinality flag is a micro-optimization (choosing to fetch the value we know we will need rather than a value which has no other purpose). I thought the comments there were sufficient; perhaps you could indicate what isn't clear? Having the ordinality be the last column is required by spec - I'm sure we could think of pg-specific extensions that would change that, but why complicate the code now? > Also, I really think the test cases are excessive. They're mostly > repeatedly testing the same functionality over and over in cases that > are superficially different but I'm fairly certain just invoke the > same codepaths. This will just be an annoyance if we make later > changes that require adjusting the output. The majority of the tests are adding an "ordinality" version to existing test cases that test a number of combinations of language, SETOF, and base vs. composite type. These do exercise various different code paths in expandRTE and thereabouts. One thing I did do is dike out and replace the entire existing test sequence that was commented as testing ExecReScanFunctionScan, because many of the tests in it did not in fact invoke any rescans and probably hadn't done since 7.4. > Notably the test that covers the view defintiion appears in pg_views > scares me. We bash around the formatting rules for view definition > outputs pretty regularly. On the other hand it is nice to have > regression tests that make sure these cases are covered. There's been > more than one bug in the past caused by omitting updating these > functions. I'm leaning towards leaving it in but in the long run we > probably need a better solution for this. There are a dozen of those kinds of tests scattered through the regression tests (though many use pg_get_viewdef directly rather than pg_views). While it might be worth centralizing them somewhere, that's really a separate issue from this patch, since it also affects aggregates.sql, window.sql, and with.sql. > Oh, and I'm definitely leaning towards naming the column "ordinality". > Given we name columns things like "generate_series" and "sum" etc > there doesn't seem to be good reason to avoid doing that here. I've already set out why I object to this. > All in all though I feel like I'm looking for trouble. It's not a very > complex patch and is definitely basically correct. Who should get > credit for it? David? Andrew? And who reviewed it? Dean? Anyone else? It was a joint project between David and myself. Credit to Dean for the thorough review. -- Andrew (irc:RhodiumToad)
On Mon, Jul 22, 2013 at 05:19:35PM +0100, Greg Stark wrote: > On Mon, Jul 22, 2013 at 4:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I haven't read this patch, but that comment scares the heck out of me. > > Even if the patch isn't broken today, it will be tomorrow, if it's > > making random changes like that in data structure semantics. > > It's not making random changes. It's just that it has two code paths, > in one it has the function scan write directly to the scan slot and in > the other it has the function write to a different slot and copies the > fields over to the scan slot. It's actually doing the right thing it's > just hard to tell that at first (imho) because it's using the scan > slots to determine which case applies instead of having a flag > something to drive the decision. > > > Also, if you're confused, so will be everybody else who has to deal with > > the code later --- so I don't think fixing the comments and variable > > names is optional. > > Well that's the main benefit of having someone review the code in my > opinion. I'm no smarter than David or Andrew who wrote the code and > there's no guarantee I'll spot any bugs. But at least I can observe > myself and see where I have difficulty following the logic which the > original author is inherently not in a position to do. > > Honestly this is pretty straightforward and well written so I'm just > being especially careful not having committed anything recently. It turns out Andrew Gierth found two issues: backward scanning and docs mentioning incorrect data type for the ordinality column. The attached patch fixes both. His chunk is the backward scans; mine the one-word change :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Вложения
Tom Lane said: > I haven't read this patch, but that comment scares the heck out of me. > Even if the patch isn't broken today, it will be tomorrow, if it's > making random changes like that in data structure semantics. > Also, if you're confused, so will be everybody else who has to deal with > the code later --- so I don't think fixing the comments and variable > names is optional. I must admit to finding all of this criticism of unread code a bit bizarre. There are no "random changes in data structure semantics". All that happens is that FunctionScan, in the ordinality case, has two tupdescs to deal with: the one for the function return, and the one for FunctionScan's own scan type. Likewise two slots, one of each type. Absolutely no liberties are taken with any of the semantics. However, since the scan structure already has a place for the scan result slot, the "extra" slot that we allocate for this case is the function result, func_slot, while in the non-ordinality case, we use the scan result slot for the function result too. [Greg: we just found a bug (actually two, one code + one docs); I think David just posted the fixed version. And ironically, the bug in the code has nothing to do with all of this discussion.] Here, to hopefully end the issue, is the new version of FunctionNext, which is the core of the whole patch (everything else is just setup for this). If anyone wants to point out exactly what is unclear, or which changes any semantics improperly, then please do indicate exactly what you are referring to. /* ----------------------------------------------------------------* FunctionNext** This is a workhorse for ExecFunctionScan*----------------------------------------------------------------*/ static TupleTableSlot * FunctionNext(FunctionScanState *node) { EState *estate; ScanDirection direction; Tuplestorestate *tuplestorestate; TupleTableSlot *scanslot; TupleTableSlot*funcslot; if (node->func_slot) { /* * ORDINALITY case: FUNCSLOT is the function return, * SCANSLOT the scanresult */ funcslot = node->func_slot; scanslot = node->ss.ss_ScanTupleSlot; } else { funcslot = node->ss.ss_ScanTupleSlot; scanslot = NULL; } /* * get information from the estate and scan state */ estate = node->ss.ps.state; direction = estate->es_direction; tuplestorestate = node->tuplestorestate; /* * If first time through, read all tuples from function and put them in a * tuplestore. Subsequent calls justfetch tuples from tuplestore. */ if (tuplestorestate == NULL) { node->tuplestorestate = tuplestorestate= ExecMakeTableFunctionResult(node->funcexpr, node->ss.ps.ps_ExprContext, node->func_tupdesc, node->eflags & EXEC_FLAG_BACKWARD); } /* * Get the next tuple from tuplestore. Return NULL if no more tuples. */ (void) tuplestore_gettupleslot(tuplestorestate, ScanDirectionIsForward(direction), false, funcslot); if (!scanslot) return funcslot; /* * we're doing ordinality, so we copy the values from the function return * slot to the (distinct) scan slot.We can do this because the lifetimes * of the values in each slot are the same; until we reset the scan or * fetchthe next tuple, both will be valid. */ ExecClearTuple(scanslot); if (ScanDirectionIsForward(direction)) node->ordinal++; else node->ordinal--; if (!TupIsNull(funcslot)) { int natts = funcslot->tts_tupleDescriptor->natts; int i; slot_getallattrs(funcslot); for (i = 0; i < natts; ++i) { scanslot->tts_values[i] = funcslot->tts_values[i]; scanslot->tts_isnull[i]= funcslot->tts_isnull[i]; } scanslot->tts_values[natts] = Int64GetDatumFast(node->ordinal); scanslot->tts_isnull[natts] = false; ExecStoreVirtualTuple(scanslot); } return scanslot; } -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > I must admit to finding all of this criticism of unread code a bit > bizarre. If you yourself are still finding bugs in the code as of this afternoon, onlookers could be forgiven for doubting that the code is quite as readable or ready to commit as all that. regards, tom lane
On 23 July 2013 06:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >> I must admit to finding all of this criticism of unread code a bit >> bizarre. > > If you yourself are still finding bugs in the code as of this afternoon, > onlookers could be forgiven for doubting that the code is quite as > readable or ready to commit as all that. > I had another look at this --- the bug fix looks reasonable, and includes a sensible set of additional regression tests. This was not a bug that implies anything fundamentally wrong with the patch. Rather it was just a fairly trivial easy-to-overlook bug of omission --- I certainly overlooked it in my previous reviews (sorry) and at least 3 more experienced hackers than me overlooked it during detailed code inspection. I don't think that really reflects negatively on the quality of the patch, or the approach taken, which I still think is good. That's also backed up by the fact that Greg isn't able to find much he wants to change. I also don't see much that needs changing in the patch, except possibly in the area where Greg expressed concerns over the comments and code clarity. I had similar concerns during my inital review, so I tend to agree that perhaps it's worth adding a separate boolean has_ordinality flag to FunctionScanState just to improve code readability. FWIW, I would probably have extended FunctionScanState like so: /* ----------------* FunctionScanState information** Function nodes are used to scan the results of a* functionappearing in FROM (typically a function returning set).** eflags node's capability flags* tupdesc node's expected return tuple description* tuplestorestate private state of tuplestore.c* funcexpr state for function expression being evaluated* has_ordinality function scan WITH ORDINALITY?* func_tupdesc for WITH ORDINALITY, the raw function tuple* description, withoutthe added ordinality column* func_slot for WITH ORDINALITY, a slot for the raw function* return tuples* ordinal for WITH ORDINALITY, the ordinality of the return* tuple* ----------------*/ typedef struct FunctionScanState { ScanState ss; /* its first field is NodeTag */ int eflags; TupleDesc tupdesc; Tuplestorestate*tuplestorestate; ExprState *funcexpr; bool has_ordinality; /* these fields are used for a functionscan WITH ORDINALITY */ TupleDesc func_tupdesc; TupleTableSlot *func_slot; int64 ordinal; } FunctionScanState; Regards, Dean
Tom Lane said: > If you yourself are still finding bugs in the code as of this afternoon, > onlookers could be forgiven for doubting that the code is quite as > readable or ready to commit as all that. Right, and we all know that all code is perfect when committed. sheesh. (This is actually the first time in six months that I've had occasion to look at that part of the code; that's how long it's been sitting in the queue. And yes, it was one of my bits, not David's. Maybe I should have left the bug in to see how long it took you to spot it?) What I'm very notably not seeing from you is any substantive feedback. You've repeatedly described this patch as broken on the basis of nothing more than garbled hearsay while loudly proclaiming not to have actually looked at it; I find this both incomprehensible and insulting. If you have legitimate technical concerns then let's hear them, now. -- Andrew (irc:RhodiumToad)
Andrew, * Andrew Gierth (andrew@tao11.riddles.org.uk) wrote: > Right, and we all know that all code is perfect when committed. sheesh. That clearly wasn't what was claimed. > (This is actually the first time in six months that I've had occasion > to look at that part of the code; that's how long it's been sitting in > the queue. While such issues are frustrating for all of us, huffing about it here isn't particularly useful. > And yes, it was one of my bits, not David's. Maybe I > should have left the bug in to see how long it took you to spot it?) That attitude is certainly discouraging. > What I'm very notably not seeing from you is any substantive feedback. > You've repeatedly described this patch as broken on the basis of > nothing more than garbled hearsay while loudly proclaiming not to have > actually looked at it; I find this both incomprehensible and insulting. As Greg is the one looking to possibly commit this, I certainly didn't consider his comments on the patch to be garbled hearsay. It would have been great if he had been more clear in his original comments, but I don't feel that you can fault any of us for reading his email and voicing what concerns we had from his review. While you might wish that we all read every patch submitted, none of us has time for that- simply keeping up with this mailing list requires a significant amount of time. > If you have legitimate technical concerns then let's hear them, now. Fine- I'd have been as happy leaving this be and letting Greg commit it, but if you'd really like to hear my concerns, I'd start with pointing out that it's pretty horrid to have to copy every record around like this and that the hack of CreateTupleDescCopyExtend is pretty terrible and likely to catch people by surprise. Having FunctionNext() operate very differently depending on WITH ORDINALITY is ugly and the cause of the bug that was found. All-in-all, I'm not convinced that this is really a good approach and feel it'd be better off implemented at a different level, eg a node type instead of a hack on top of the existing SRF code. Now, what would be great to see would be your response to Dean's comments and suggestions rather than berating someone who's barely written 5 sentences on this whole thread. Thanks, Stephen
On Tue, Jul 23, 2013 at 9:27 PM, Stephen Frost <sfrost@snowman.net> wrote: > > Fine- I'd have been as happy leaving this be and letting Greg commit it, > but if you'd really like to hear my concerns, I'd start with pointing > out that it's pretty horrid to have to copy every record around like > this and that the hack of CreateTupleDescCopyExtend is pretty terrible > and likely to catch people by surprise. Having FunctionNext() operate > very differently depending on WITH ORDINALITY is ugly and the cause of > the bug that was found. All-in-all, I'm not convinced that this is > really a good approach and feel it'd be better off implemented at a > different level, eg a node type instead of a hack on top of the existing > SRF code. Fwiw I've been mulling over the same questions and came to the conclusion that the existing approach makes the most sense. In an ideal world an extra executor node would be the prettiest, cleanest implementation. But the amount of extra code and busywork that would necessitate just isn't justified for the amount of work it would be doing. It might be different if WITH ORDINALITY made sense for any other types of target tables. But it really only makes sense for SRFs. The whole motivation for having them in the spec is that UNNEST is taking an ordered list and turning it into a relation which is unordered. You can't just do row_number() because there's nothing to make it ordered by. By the same token for any other data source you would just use row_number *precisely because* any other data source is unordered and you should have to specify an order in order to make row_number produce something meaningful. So given that WITH ORDINALITY is really only useful for UNNEST and we can generalize it to all SRFs on the basis that Postgres SRFs do produce ordered sets not unordered relations it isn't crazy for the work to be in the Function node. Now that I've written that though it occurs to me to wonder whether FDW tables might be usefully said to be ordered too though? -- greg
* Greg Stark (stark@mit.edu) wrote: > So given that WITH ORDINALITY is really only useful for UNNEST and we > can generalize it to all SRFs on the basis that Postgres SRFs do > produce ordered sets not unordered relations it isn't crazy for the > work to be in the Function node. I agree, it isn't *crazy*. :) > Now that I've written that though it occurs to me to wonder whether > FDW tables might be usefully said to be ordered too though? My thought around this was a VALUES() construct, but FDWs are an interesting case to consider also; with FDWs it's possible that something is said in SQL/MED regarding this question- perhaps it would make sense to look there? Thanks, Stephen
On Tue, Jul 23, 2013 at 05:23:17PM -0400, Stephen Frost wrote: > * Greg Stark (stark@mit.edu) wrote: > > So given that WITH ORDINALITY is really only useful for UNNEST and we > > can generalize it to all SRFs on the basis that Postgres SRFs do > > produce ordered sets not unordered relations it isn't crazy for the > > work to be in the Function node. > > I agree, it isn't *crazy*. :) > > > Now that I've written that though it occurs to me to wonder whether > > FDW tables might be usefully said to be ordered too though? > > My thought around this was a VALUES() construct, but FDWs are an > interesting case to consider also; with FDWs it's possible that > something is said in SQL/MED regarding this question- perhaps it would > make sense to look there? There are a lot of ways foreign tables don't yet act like local ones. Much as I'm a booster for fixing that problem, I'm thinking improvements in this direction are for a separate patch. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David
On Tuesday, July 23, 2013, David Fetter wrote:
On Tuesday, July 23, 2013, David Fetter wrote:
There are a lot of ways foreign tables don't yet act like local ones.
Much as I'm a booster for fixing that problem, I'm thinking
improvements in this direction are for a separate patch.
This isn't about making FDWs more like local tables- indeed, quite the opposite. The question is if we should declare that WITH ORDINALITY will only ever be for SRFs or if we should consider that it might be useful with FDWs specifically because they are not unordered sets as tables are. Claiming that question is unrelated to the implementation of WITH ORDINALITY is rather... Bizarre.
Thanks,
Stephen
On Tue, Jul 23, 2013 at 06:09:20PM -0400, Stephen Frost wrote: > David > > On Tuesday, July 23, 2013, David Fetter wrote: > > > > There are a lot of ways foreign tables don't yet act like local > > ones. Much as I'm a booster for fixing that problem, I'm thinking > > improvements in this direction are for a separate patch. > > > > This isn't about making FDWs more like local tables- indeed, quite > the opposite. The question is if we should declare that WITH > ORDINALITY will only ever be for SRFs or if we should consider that > it might be useful with FDWs specifically because they are not > unordered sets as tables are. Claiming that question is unrelated > to the implementation of WITH ORDINALITY is rather... Bizarre. Are you saying that there's stuff that if I don't put it in now will impede our ability to add this to FTs later? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tuesday, July 23, 2013, David Fetter wrote:
Are you saying that there's stuff that if I don't put it in now will
impede our ability to add this to FTs later?
I'm saying that it'd be a completely different implementation and that this one would get in the way and essentially have to be ripped out.
No one is saying that this patch wouldn't work for the specific use-case that it set out to meet, and maybe it's unfair for us to consider possible use-cases beyond the patch's goal and the spec requirement, but that, IMO, is also one of the things that makes PG great. MVCC isn't necessary and isn't required by spec either.
Thanks,
Stephen
Stephen Frost said: > [stuff about foreign tables] I think the issue with foreign tables is probably moot because if you _did_ want to have some types of foreign tables with a fixed ordinality, you'd probably also want qual pushdown to work for it (i.e. so that WHERE rownum=123 doesn't have to filter all the rows), whereas with SRFs this doesn't really apply. For this to work, foreign tables with a fixed ordering would have to provide that in the FDW - which is in any case the only place that knows whether a fixed order would even make any sense. So I see no overlap here with the SRF ordinality case. As for VALUES, the spec regards those as constructing a table and therefore not having any inherent order - the user can add their own ordinal column if need be. Even if you did want to add WITH ORDINALITY for VALUES, though, it would actually make more sense to do it in the Values Scan node since that maintains its own ordinal position already. -- Andrew (irc:RhodiumToad)
Stephen Frost <sfrost@snowman.net> writes: > * Andrew Gierth (andrew@tao11.riddles.org.uk) wrote: >> If you have legitimate technical concerns then let's hear them, now. > Fine- I'd have been as happy leaving this be and letting Greg commit it, > but if you'd really like to hear my concerns, I'd start with pointing > out that it's pretty horrid to have to copy every record around like > this and that the hack of CreateTupleDescCopyExtend is pretty terrible > and likely to catch people by surprise. Having FunctionNext() operate > very differently depending on WITH ORDINALITY is ugly and the cause of > the bug that was found. All-in-all, I'm not convinced that this is > really a good approach and feel it'd be better off implemented at a > different level, eg a node type instead of a hack on top of the existing > SRF code. I took the time to read through this patch, finally. i think the $64 question is pretty much what Stephen says above: do we like tying this behavior to FunctionScan, as opposed to having it be some kind of expression node? You could certainly imagine a WithOrdinality expression node that takes in values of a set-returning expression, and returns them with an extra column tacked on. This would resolve the problem that was mentioned awhile back that the current approach can't support SRFs in targetlists. If it weren't that we've been speculating for years about deprecating SRFs-in-tlists once we had LATERAL, I would personally consider this patch DOA in this form. If we do think we'll probably deprecate that feature, then not extending WITH ORDINALITY to such cases is at least defensible. On the other hand, considering that we've yet to ship a release supporting LATERAL, it's probably premature to commit to such deprecation --- we don't really know whether people will find LATERAL to be a convenient and performant substitute. As far as performance goes, despite Stephen's gripe above, I think this way is likely better than any alternative. The reason is that once we've disassembled the function result tuple and tacked on the counter, we have a reasonable shot at things staying like that (in the form of a virtual TupleTableSlot). The next higher level of evaluation can probably use the column Datums as-is. A WithOrdinality expression node would have to disassemble the input tuple and then make a new tuple, which the next higher expression level would likely pull apart again :-(. Also, any other approach would lead to needing to store the ordinality values inside the FunctionScan's tuplestore, bloating that storage with rather-low-value data. The other big technical issue I see is representation of the rowtype of the result. If we did it with a WithOrdinality expression node, the result would always be of type RECORD, and we'd have to use blessed tuple descriptors to keep track of exactly which record type a particular instance emits. This is certainly do-able (see RowExpr for precedent). Attaching the functionality to FunctionScan reduces the need for that because the system mostly avoids trying to associate any type OID with the rowtype of a FROM item. Instead though, we've got a lot of ad-hoc code that deals with RangeTblEntry type information. A big part of the patch is dealing with extending that code, and frankly I've got about zero confidence that the patch has found everything that needs to be found in that line. A patch using an expression node would probably need to touch only a much more localized set of places to handle the type description issue. Anyway, on balance I'm satisfied with this overall approach, though it's not without room for debate. I am fairly dissatisfied with the patch at a stylistic level, though. It seems way too short on comments. People griped about the code in nodeFunctionscan in particular, but I think all the changes in ad-hoc type-management code elsewhere are even more deserving of comments, and they mostly didn't get that. Likewise, there's no documentation anywhere that I can see of the new interrelationships of struct fields, such as that if a RangeTblEntry has funcordinality set, then its various column-related fields such as eref->colnames include a trailing INT8 column for the ordinality. Also, maybe I'm misreading the patch (have I mentioned lately that large patches in -u format are practically illegible?), but it sure looks like it flat out removed several existing regression-test cases and a few existing comments as well. How is that considered acceptable? FWIW, I concur with the gripe I remember seeing upthread that the default name of the added column ought not be "?column?". regards, tom lane
On Tue, Jul 23, 2013 at 9:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > If it weren't that we've been speculating for years about deprecating > SRFs-in-tlists once we had LATERAL, I would personally consider this > patch DOA in this form. If we do think we'll probably deprecate that > feature, then not extending WITH ORDINALITY to such cases is at least > defensible. On the other hand, considering that we've yet to ship a > release supporting LATERAL, it's probably premature to commit to such > deprecation --- we don't really know whether people will find LATERAL > to be a convenient and performant substitute. I guess I'd sort of assumed that the plan was to continue accepting SRFs in tlists but rewrite them as lateral joins, rather than getting rid of them altogether. IIUC that would simplify some things inside the executor. I'd be a bit more reluctant to just ban SRFs in target lists outright. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jul 23, 2013 at 9:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If it weren't that we've been speculating for years about deprecating >> SRFs-in-tlists once we had LATERAL, I would personally consider this >> patch DOA in this form. > I guess I'd sort of assumed that the plan was to continue accepting > SRFs in tlists but rewrite them as lateral joins, rather than getting > rid of them altogether. That seems to me to be unlikely to happen, because it would be impossible to preserve the current (admittedly bad) semantics. If we're going to change the behavior at all we might as well just drop the feature, IMO. regards, tom lane
On Fri, Jul 19, 2013 at 1:50 PM, Greg Stark <stark@mit.edu> wrote: > My only reservation with this patch is whether the WITH_ORDINALITY > parser hack is the way we want to go. The precedent was already set > with WITH TIME ZONE though and I think this was the best option. I share this reservation. Lexer hacks are reasonable ways of getting LALR(2)-ish behavior in very simple cases, but it doesn't take much to get into trouble. I think the with ordinality as (select 1) select * from ordinality example you posted is precisely on point. Currently, we will have four classes of keywords: unreserved, column-name, type-function, and reserved. There are rules for when each of those types of keywords needs to be quoted, and those rules are relatively well-understood. This patch will introduce, without documentation, a fifth class of keyword. ORDINALITY will need to be quoted when, and only when, it immediately follows WITH. Without some change to our deparsing code, this is a dump/restore hazard; and with some such change it's still probably not a good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 24, 2013 at 1:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jul 23, 2013 at 9:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> If it weren't that we've been speculating for years about deprecating >>> SRFs-in-tlists once we had LATERAL, I would personally consider this >>> patch DOA in this form. > >> I guess I'd sort of assumed that the plan was to continue accepting >> SRFs in tlists but rewrite them as lateral joins, rather than getting >> rid of them altogether. > > That seems to me to be unlikely to happen, because it would be > impossible to preserve the current (admittedly bad) semantics. > If we're going to change the behavior at all we might as well just > drop the feature, IMO. Maybe. I'd be kind of sad to lose some of the simple cases that work now, like SELECT srf(), in favor of having to write SELECT * FROM srf(). I'd probably get over it, but I'm sure a lot of people would be mildly annoyed at having to change their working application code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 24, 2013 at 6:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > That seems to me to be unlikely to happen, because it would be > impossible to preserve the current (admittedly bad) semantics. > If we're going to change the behavior at all we might as well just > drop the feature, IMO. It would be nice to support a single SRF in the target list. That would side-step the bad semantics and also make it easier to implement. But I'm not sure how easy it would be in practice because I've learned not to underestimate the difficulty of making seemingly small changes to the planner. -- greg
On Wed, Jul 24, 2013 at 6:39 PM, Robert Haas <robertmhaas@gmail.com> wrote: > This patch will introduce, without documentation, a fifth class of > keyword. ORDINALITY will need to be quoted when, and only when, it > immediately follows WITH. Without some change to our deparsing code, > this is a dump/restore hazard; and with some such change it's still > probably not a good idea. Strictly speaking this patc doesn't introduce this fifth class of keyword. We already had TIME in that category (and also FIRST and LAST in a similar category following NULLS). If we have a solution for WITH <keyword> then presumably we would implement it for WITH TIME and WITH ORDINALITY at the same time. In the interim I suppose we could teach pg_dump to quote any keyword that follows WITH or NULLS pretty easily. Or just quote those four words unconditionally. -- greg
Greg Stark <stark@mit.edu> writes: > On Wed, Jul 24, 2013 at 6:39 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> This patch will introduce, without documentation, a fifth class of >> keyword. ORDINALITY will need to be quoted when, and only when, it >> immediately follows WITH. Without some change to our deparsing code, >> this is a dump/restore hazard; and with some such change it's still >> probably not a good idea. > Strictly speaking this patc doesn't introduce this fifth class of > keyword. We already had TIME in that category (and also FIRST and LAST > in a similar category following NULLS). If we have a solution for WITH > <keyword> then presumably we would implement it for WITH TIME and WITH > ORDINALITY at the same time. It strikes me that we could hack the grammar for CTEs so that it allows WITH_ORDINALITY and WITH_TIME as the first token, with appropriate insertion of the proper identifier value. I'm not sure if there are any other places where WITH can be followed by a random identifier. I don't see any workable fix that doesn't involve the funny token, though. Consider CREATE VIEW v AS SELECT ... FROM UNNEST(...) WITH ORDINALITY; CREATE VIEW v AS SELECT ... FROM UNNEST(...) WITH NO DATA; WITH ORDINALITY really needs to be parsed as part of the FROM clause. WITH NO DATA really needs to *not* be parsed as part of the FROM clause. Without looking ahead more than one token, there is absolutely no way for the grammar to decide if it's got the whole FROM clause or not at the point where WITH is the next token. So our choices are to have two-token lookahead at the lexer level, or to give up on bison and find something that can implement a parsing algorithm better than LALR(1). I know which one seems more likely to get done in the foreseeable future. regards, tom lane
On 2013-07-24 13:36:39 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Jul 23, 2013 at 9:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> If it weren't that we've been speculating for years about deprecating > >> SRFs-in-tlists once we had LATERAL, I would personally consider this > >> patch DOA in this form. > > > I guess I'd sort of assumed that the plan was to continue accepting > > SRFs in tlists but rewrite them as lateral joins, rather than getting > > rid of them altogether. > > That seems to me to be unlikely to happen, because it would be > impossible to preserve the current (admittedly bad) semantics. > If we're going to change the behavior at all we might as well just > drop the feature, IMO. I think removing the feature will be a rather painful procedure for users and thus will need a rather long deprecation period. The amount of code using SRFs in targetlists is quite huge if my experience is anything to go by. And much of that can trivially/centrally be rewritten to LATERAL, not to speak of the cross-version compatibility problem. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Tom Lane said: > If we did it with a WithOrdinality expression node, the result would > always be of type RECORD, and we'd have to use blessed tuple > descriptors to keep track of exactly which record type a particular > instance emits. This is certainly do-able (see RowExpr for > precedent). Maybe RowExpr is a precedent for something, but it has this long-standing problem that makes it very hard to use usefully: postgres=# select (r).* from (select row(a,b) as r from (values (1,2)) v(a,b)) s; ERROR: record type has not been registered > It seems way too short on comments. [...] This can certainly be addressed. > but it sure looks like it flat out removed several existing > regression-test cases Here's why, in rangefuncs.sql: --invokes ExecReScanFunctionScan SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM foorescan(5002,5004)) ORDER BY 1,2; I don't think that has invoked ExecReScanFunctionScan since 7.4 or so. It certainly does not do so now (confirmed by gdb as well as by the query plan). By all means keep the old tests if you want a never-remove-tests-for-any-reason policy, but having added tests that actually _do_ invoke ExecReScanFunctionScan, I figured the old ones were redundant. (Also, these kinds of tests can be done a bit better now with values and lateral rather than creating and dropping tables just for the one test.) > and a few existing comments as well. I've double-checked, and I don't see any existing comments removed. > FWIW, I concur with the gripe I remember seeing upthread that the > default name of the added column ought not be "?column?". This seems to be a common complaint, but gives rise to two questions: 1) what should the name be? 2) should users be depending on it? I've yet to find another db that actually documents a specific column name for the ordinality column; it's always taken for granted that the user should always be supplying an alias. (Admittedly there are not many dbs that support it at all; DB2 does, and I believe Teradata.) -- Andrew (irc:RhodiumToad)
On Wed, Jul 24, 2013 at 09:38:15PM +0000, Andrew Gierth wrote: > Tom Lane said: > > If we did it with a WithOrdinality expression node, the result would > > always be of type RECORD, and we'd have to use blessed tuple > > descriptors to keep track of exactly which record type a particular > > instance emits. This is certainly do-able (see RowExpr for > > precedent). > > Maybe RowExpr is a precedent for something, but it has this > long-standing problem that makes it very hard to use usefully: > > postgres=# select (r).* from (select row(a,b) as r from (values (1,2)) v(a,b)) s; > ERROR: record type has not been registered > > > It seems way too short on comments. [...] > > This can certainly be addressed. > > > but it sure looks like it flat out removed several existing > > regression-test cases > > Here's why, in rangefuncs.sql: > > --invokes ExecReScanFunctionScan > SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM foorescan(5002,5004)) ORDER BY 1,2; > > I don't think that has invoked ExecReScanFunctionScan since 7.4 or so. > It certainly does not do so now (confirmed by gdb as well as by the > query plan). By all means keep the old tests if you want a > never-remove-tests-for-any-reason policy, but having added tests that > actually _do_ invoke ExecReScanFunctionScan, I figured the old ones > were redundant. (Also, these kinds of tests can be done a bit better > now with values and lateral rather than creating and dropping tables > just for the one test.) > > > and a few existing comments as well. > > I've double-checked, and I don't see any existing comments removed. > > > FWIW, I concur with the gripe I remember seeing upthread that the > > default name of the added column ought not be "?column?". > > This seems to be a common complaint, but gives rise to two questions: > > 1) what should the name be? > > 2) should users be depending on it? > > I've yet to find another db that actually documents a specific column > name for the ordinality column; it's always taken for granted that the > user should always be supplying an alias. (Admittedly there are not > many dbs that support it at all; DB2 does, and I believe Teradata.) Next patch: changes by Andrew Gierth, testing vs up-to-date git master by Yours Truly. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Вложения
On Wed, Jul 24, 2013 at 1:50 PM, Greg Stark <stark@mit.edu> wrote: > On Wed, Jul 24, 2013 at 6:39 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> This patch will introduce, without documentation, a fifth class of >> keyword. ORDINALITY will need to be quoted when, and only when, it >> immediately follows WITH. Without some change to our deparsing code, >> this is a dump/restore hazard; and with some such change it's still >> probably not a good idea. > > Strictly speaking this patc doesn't introduce this fifth class of > keyword. We already had TIME in that category (and also FIRST and LAST > in a similar category following NULLS). If we have a solution for WITH > <keyword> then presumably we would implement it for WITH TIME and WITH > ORDINALITY at the same time. > > In the interim I suppose we could teach pg_dump to quote any keyword > that follows WITH or NULLS pretty easily. Or just quote those four > words unconditionally. Making these keywords reserved-enough they get quoted would indeed fix the problem. It may not be desirable for other reasons, but the fact that we have existing cases where pg_dump DTWT doesn't seem like a good reason to add more of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 25, 2013 at 10:33:54AM -0700, David Fetter wrote: > On Wed, Jul 24, 2013 at 09:38:15PM +0000, Andrew Gierth wrote: > > Tom Lane said: > > > If we did it with a WithOrdinality expression node, the result would > > > always be of type RECORD, and we'd have to use blessed tuple > > > descriptors to keep track of exactly which record type a particular > > > instance emits. This is certainly do-able (see RowExpr for > > > precedent). > > > > Maybe RowExpr is a precedent for something, but it has this > > long-standing problem that makes it very hard to use usefully: > > > > postgres=# select (r).* from (select row(a,b) as r from (values (1,2)) v(a,b)) s; > > ERROR: record type has not been registered > > > > > It seems way too short on comments. [...] > > > > This can certainly be addressed. > > > > > but it sure looks like it flat out removed several existing > > > regression-test cases > > > > Here's why, in rangefuncs.sql: > > > > --invokes ExecReScanFunctionScan > > SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM foorescan(5002,5004)) ORDER BY 1,2; > > > > I don't think that has invoked ExecReScanFunctionScan since 7.4 or so. > > It certainly does not do so now (confirmed by gdb as well as by the > > query plan). By all means keep the old tests if you want a > > never-remove-tests-for-any-reason policy, but having added tests that > > actually _do_ invoke ExecReScanFunctionScan, I figured the old ones > > were redundant. (Also, these kinds of tests can be done a bit better > > now with values and lateral rather than creating and dropping tables > > just for the one test.) > > > > > and a few existing comments as well. > > > > I've double-checked, and I don't see any existing comments removed. > > > > > FWIW, I concur with the gripe I remember seeing upthread that the > > > default name of the added column ought not be "?column?". > > > > This seems to be a common complaint, but gives rise to two questions: > > > > 1) what should the name be? > > > > 2) should users be depending on it? > > > > I've yet to find another db that actually documents a specific column > > name for the ordinality column; it's always taken for granted that the > > user should always be supplying an alias. (Admittedly there are not > > many dbs that support it at all; DB2 does, and I believe Teradata.) > > Next patch: changes by Andrew Gierth, testing vs up-to-date git master > by Yours Truly. Greg, Are you still on this? Do you have questions or concerns? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sun, Jul 28, 2013 at 6:49 AM, David Fetter <david@fetter.org> wrote: > Are you still on this? Do you have questions or concerns? Still on this, I've just been a bit busy the past few days. -- greg
On Wed, Jul 24, 2013 at 7:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't see any workable fix that doesn't involve the funny token, though. > Consider > > CREATE VIEW v AS SELECT ... FROM UNNEST(...) WITH ORDINALITY; > CREATE VIEW v AS SELECT ... FROM UNNEST(...) WITH NO DATA; > > WITH ORDINALITY really needs to be parsed as part of the FROM clause. > WITH NO DATA really needs to *not* be parsed as part of the FROM clause. > Without looking ahead more than one token, there is absolutely no way > for the grammar to decide if it's got the whole FROM clause or not > at the point where WITH is the next token. So our choices are to have > two-token lookahead at the lexer level, or to give up on bison and find > something that can implement a parsing algorithm better than LALR(1). > I know which one seems more likely to get done in the foreseeable future. It occurs to me we might be being silly here. Instead of collapsing WITH TIME and WITH ORDINALITY into a single token why don't we just modify the WITH token to WITH_FOLLOWED_BY_TIME and WITH_FOLLOWED_BY_ORDINALITY but still keep the following token. Then we can just include those two tokens everywhere we include WITH. Basically we would be giving the parser a free extra token of lookahead whenever it gets WITH. I think that's isomorphic to what Tom suggested but requires less surgery on the parser and automatically covers any other cases we don't need to track down. -- greg
Greg Stark <stark@mit.edu> writes: > Instead of collapsing WITH TIME and WITH ORDINALITY into a single > token why don't we just modify the WITH token to WITH_FOLLOWED_BY_TIME > and WITH_FOLLOWED_BY_ORDINALITY but still keep the following token. > Then we can just include those two tokens everywhere we include WITH. > Basically we would be giving the parser a free extra token of > lookahead whenever it gets WITH. > I think that's isomorphic to what Tom suggested but requires less > surgery on the parser and automatically covers any other cases we > don't need to track down. I suspect it's likely to come out about the same either way once you account for all the uses of WITH. Might be worth trying both to see which seems less ugly. regards, tom lane
I propose the following patch (which goes on top of the current ordinality one) to implement the suggested grammar changes. I think this is the cleanest way, and I've tested that it both passes regression and allows constructs like WITH time AS (...) to work. -- Andrew (irc:RhodiumToad)
Вложения
On 07/25/2013 02:01 AM, Andres Freund wrote: > And much of that can trivially/centrally be rewritten to LATERAL, not > to speak of the cross-version compatibility problem. An interesting example of that can be found here: http://stackoverflow.com/q/12414750/398670 where someone's looking for a way to "zip" two arrays pairwise into an array of arrays. As far as I can tell LATERAL won't help with this; you'd need unnest WITH ORDINALITY then a join on the ordinal, or you'd need full support for SQL UNNEST with multiple array arguments. Unless LATERAL provides a way to do lock-step iteration through a pair (or more) of functions I don't think we can get rid of SRFs-in-FROM just yet. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 07/29/2013 09:56 AM, Craig Ringer wrote: > Unless LATERAL provides a way to do lock-step iteration through a pair > (or more) of functions I don't think we can get rid of SRFs-in-FROM just > yet. I don't think anyone was arguing for that; we're talking about deprecating SRFs-in-SELECT.
On 07/29/2013 04:31 PM, Vik Fearing wrote: > On 07/29/2013 09:56 AM, Craig Ringer wrote: >> Unless LATERAL provides a way to do lock-step iteration through a pair >> (or more) of functions I don't think we can get rid of SRFs-in-FROM just >> yet. > > I don't think anyone was arguing for that; we're talking about > deprecating SRFs-in-SELECT. Argh, as I was I. A thinko; I was *thinking* SELECT and wrote FROM. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jul 29, 2013 at 8:56 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > Unless LATERAL provides a way to do lock-step iteration through a pair > (or more) of functions I don't think we can get rid of SRFs [in select target lists] yet You don't even need lateral. This works fine: postgres=# select * from generate_series(1,10) with ordinality as a(a,o) natural full outer join generate_series(1,5) with ordinality as b(b,o) ; o | a | b ----+----+--- 1 | 1 | 1 2 | 2 | 2 3 | 3 | 3 4 | 4 | 4 5 | 5 | 5 6 | 6 | 7 | 7 | 8 | 8 | 9 | 9 |10 | 10 | (10 rows) However it occurs to me that the plan isn't ideal: postgres=# explain select * from generate_series(1,10) with ordinality as a(a,o) natural full outer join generate_series(1,5) with ordinality as b(b,o) ; QUERY PLAN ---------------------------------------------------------------------------------------Merge Full Join (cost=119.66..199.66rows=5000 width=24) Merge Cond: (a.o = b.o) -> Sort (cost=59.83..62.33 rows=1000 width=12) Sort Key: a.o -> Function Scan on generate_series a (cost=0.00..10.00 rows=1000 width=12) -> Sort (cost=59.83..62.33 rows=1000 width=12) Sort Key: b.o -> Function Scan on generate_seriesb (cost=0.00..10.00 rows=1000 width=12) (8 rows) I think all that's required to avoid the sorts is teaching the planner that the Path has a PathKey of the ordinal column. I can look at that later but I'll go ahead and commit it without it at first. I wonder if it's also useful to teach the planner that the column is unique. -- greg
On 07/29/2013 08:02 PM, Greg Stark wrote: >> > Unless LATERAL provides a way to do lock-step iteration through a pair >> > (or more) of functions I don't think we can get rid of SRFs [in select target lists] yet > You don't even need lateral. This works fine: > > postgres=# select * from generate_series(1,10) with ordinality as Exactly - that's why the previous paragraph was: >> As far as I can tell LATERAL won't help with this; you'd need unnest >> WITH ORDINALITY then a join on the ordinal, or you'd need full support >> for SQL UNNEST with multiple array arguments. ;-) I'm interested to see that it might be possible to evaluate multiple "WITH ORDINALITY" SRFs in FROM together rather than having to perform a join. That'd make it a much saner replacement for SRFs in the SELECT list. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Jul 28, 2013 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I suspect it's likely to come out about the same either way once you > account for all the uses of WITH. Might be worth trying both to see > which seems less ugly. So I'm not really sure how to do it the other way. Once you're in parser rules I don't know how easy it is to start injecting tokens. But it seems cleaner this way where only the places where accepting WITH_ORDINALITY and WITH_TIME create conflicts need to worry about it. Everywhere else can just accept "with" and not worry about the problem. I did the same thing to NULLS_FIRST and NULLS_LAST but then I realized I couldn't actually fix the rules the same way. NULLS_P is in unreserved_keywords and adding NULLS_FIRST or NULLS_LAST there creates conflicts of course. This week isn't one of the two weeks of my life when I grokked LALR grammars and how to resolve conflicts in bison. Incidentally I noticed a problem that is actually a bug in the WITH ORDINALITY patch. The ecpg preprocessor perl script is broken now. Will fix. -- greg
Вложения
All: Because this patch is still being discussed and tinkered with, I have moved it to 9.4CF2. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Jul 29, 2013 at 8:45 PM, Josh Berkus <josh@agliodbs.com> wrote: > Because this patch is still being discussed and tinkered with, I have > moved it to 9.4CF2. Fwiw I already committed it. In the end I made only trivial changes the most significant of which was changing the column name to "ordinality". I found the changes I was making didn't really make much difference and were turning into bike shedding. There are two followup changes that were discussed in this thread: 1) Changing the WITH_* and NULLS_* tokens to not eat the following token if it's not used by the grammar so that it doesn't interfere with syntax like "select nulls first from t as a(nulls)". 2) Teaching the parser that the functionscan is ordered by ordinality so it can do a merge join without resorting the inputs. That would relieve the one remaining piece of functionality that multiple SRFs in the target list give over SRFs in the from list. I have patches for both of these in progress but frankly they're both stuck and I'm not likely to finish either without some advice. I'll start new threads (or already have) for them though. -- greg
Greg Stark escribió: > There are two followup changes that were discussed in this thread: > > 1) Changing the WITH_* and NULLS_* tokens to not eat the following > token if it's not used by the grammar so that it doesn't interfere > with syntax like "select nulls first from t as a(nulls)". > I have patches for both of these in progress but frankly they're both > stuck and I'm not likely to finish either without some advice. I'll > start new threads (or already have) for them though. Andrew Gierth submitted a patch for this ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jul 29, 2013 at 1:42 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > I propose the following patch (which goes on top of the current > ordinality one) to implement the suggested grammar changes. > > I think this is the cleanest way, and I've tested that it both > passes regression and allows constructs like WITH time AS (...) > to work. This looks like really nice work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<br /><div class="gmail_quote">On Mon, Aug 5, 2013 at 1:31 AM, Robert Haas <span dir="ltr"><<a href="mailto:robertmhaas@gmail.com"target="_blank">robertmhaas@gmail.com</a>></span> wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb adM"><div class="h5"><br/></div></div>This looks like really nice work.</blockquote></div><br />It does. It's functionally equivalentto my attempt but with much better comments and cleaner code. <br /><br />But it doesn't seem to cover the caseI was stumped on, namely "nulls first" appearing in a place where two unreserved keywords can appear consecutively. Thisdoesn't happen for WITH_* before "with" is a reserved keyword.<br /><br />Looking into it a bit I see that the case Iwas most worried about is actually a non-issue. We don't support column aliases without "AS" unless the alias is completelyunknown to the parser. That seems a bit of a strange rule that must make the syntax with the missing AS prettyunreliable if people are looking for code that will continue to work in future versions. I never knew about this.<br/><br />The only other case I could come up with in my regression tests is pretty esoteric:<br /><br />CREATE COLLATIONnulls (locale='C');<br />ALTER OPERATOR CLASS text_ops USING btree RENAME TO first;<br />CREATE TABLE nulls_first(ttext);<br /> CREATE INDEX nulls_first_i ON nulls_first(t COLLATE nulls first);<br /><br />I'm not 100% surethere aren't other cases where this can occur though.<br clear="all" /><br />-- <br />greg<br />
On Tue, Aug 06, 2013 at 11:10:11PM +0100, Greg Stark wrote: > On Mon, Aug 5, 2013 at 1:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > > This looks like really nice work. > > It does. It's functionally equivalent to my attempt but with much better > comments and cleaner code. > > But it doesn't seem to cover the case I was stumped on, namely "nulls > first" appearing in a place where two unreserved keywords can appear > consecutively. This doesn't happen for WITH_* before "with" is a reserved > keyword. > > Looking into it a bit I see that the case I was most worried about is > actually a non-issue. We don't support column aliases without "AS" unless > the alias is completely unknown to the parser. That seems a bit of a > strange rule that must make the syntax with the missing AS pretty > unreliable if people are looking for code that will continue to work in > future versions. I never knew about this. > > The only other case I could come up with in my regression tests is pretty > esoteric: > > CREATE COLLATION nulls (locale='C'); > ALTER OPERATOR CLASS text_ops USING btree RENAME TO first; > CREATE TABLE nulls_first(t text); > CREATE INDEX nulls_first_i ON nulls_first(t COLLATE nulls first); I am pretty sure we dismiss as "pilot error" foolishness at levels much lower than this. > I'm not 100% sure there aren't other cases where this can occur though. If you don't find one considerably simpler, I'm inclined to say we should let it lie, possibly with docs--even user-visible ones if you think it's appropriate. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Aug 6, 2013 at 6:10 PM, Greg Stark <stark@mit.edu> wrote:
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
The only other case I could come up with in my regression tests is pretty esoteric:
CREATE COLLATION nulls (locale='C');
ALTER OPERATOR CLASS text_ops USING btree RENAME TO first;
CREATE TABLE nulls_first(t text);
CREATE INDEX nulls_first_i ON nulls_first(t COLLATE nulls first);
I'm not 100% sure there aren't other cases where this can occur though.
Blech. Well, that's why we need to stop hacking the lexer before we shoot a hole through our foot that's too large to ignore. But it's not this patch's job to fix that problem.
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Aug 13, 2013 at 8:20 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Blech. Well, that's why we need to stop hacking the lexer before we shoot a > hole through our foot that's too large to ignore. But it's not this patch's > job to fix that problem. Hm. I thought it was. However it turns out the NULLS FIRST and the WITH* problems are not exactly analogous. Because NULLS and FIRST are both unreserved keywords whereas WITH is a reserved keyword the problems are really different. Whereas WITH can be fixed by going through all the places in the grammar where WITH appears, NULLS FIRST really can't be fixed without reserving NULLS. -- greg