Обсуждение: Re: [GENERAL] Empty arrays with ARRAY[]
On Nov 26, 2007 3:58 AM, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Mon, Nov 26, 2007 at 03:51:37AM +1100, Brendan Jurd wrote: > > I noticed in the 8.3 release notes that ARRAY(SELECT ...) now returns > > an empty array if there are no rows returned by the subquery. > > This has come up before, Tom had an idea about how to fix it: > > http://groups.google.com/group/pgsql.general/browse_thread/thread/911791e145a17daa/6b035035aeaac399 > http://www.mail-archive.com/pgsql-general@postgresql.org/msg90681.html [moving thread to -hackers] Thanks for the link Martijn. I'd be interested in taking a swing at this if nobody else has laid claim. Since that thread died back in January, I'm guessing it's wide open. Regards, BJ
Quoting Tom, from the previous thread linked by Martijn: > It could be pretty ugly, because type assignment normally proceeds > bottom-up :-(. What you might have to do is make the raw grammar > representation of ARRAY[] work like A_Const does, ie, there's a > slot to plug in a typecast. That's pretty much vestigial now for > A_Const, if memory serves, but it'd be needful if ARRAY[] has to > be able to "see" the typecast that would otherwise be above it in > the parse tree. This approach is making sense to me, but I've run into a bit of a dependency issue. A_Const does indeed have a slot for typecasts by way of a TypeName member. A_Const and TypeName are both defined in parsenodes.h, whereas ArrayExpr is defined in primnodes.h. So unfortunately I can't just add a TypeName member to ArrayExpr. I'm new to this area of the codebase (and parsers generally), so I'm treading carefully. What would be the best way to resolve this? Would moving TypeName into primnodes.h be acceptable? Thanks for your time, BJ
"Brendan Jurd" <direvus@gmail.com> writes: > This approach is making sense to me, but I've run into a bit of a > dependency issue. A_Const does indeed have a slot for typecasts by > way of a TypeName member. A_Const and TypeName are both defined in > parsenodes.h, whereas ArrayExpr is defined in primnodes.h. So > unfortunately I can't just add a TypeName member to ArrayExpr. That would be quite the wrong thing to do anyway, since ArrayExpr is a run-time representation and shouldn't have any such thing attached to it. What you probably need is a separate parse-time representation of ARRAY[], a la the difference between A_Const and Const. Another possibility is to just hack up a private communication path between transformExpr and transformArrayExpr, ie when you see TypeCast check to see if its argument is ArrayExpr and do something different. This would be a mite klugy but it'd be a much smaller patch that way. regards, tom lane
On Nov 27, 2007 8:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Brendan Jurd" <direvus@gmail.com> writes: > > ... So > > unfortunately I can't just add a TypeName member to ArrayExpr. > > That would be quite the wrong thing to do anyway, since ArrayExpr is > a run-time representation and shouldn't have any such thing attached > to it. What you probably need is a separate parse-time representation > of ARRAY[], a la the difference between A_Const and Const. > Ah. I wasn't aware of the distinction; I started by looking in gram.y and saw that the ARRAY parse path creates an ArrayExpr node, whilst the constant parse paths create A_Const nodes. I didn't realise that ArrayExpr was "skipping ahead" and creating the same kind of object that the transform produces. Glad I stopped and asked for directions then. =) I'm not 100% clear on what the A_ prefix signifies ... is A_ArrayExpr a good name for the parse-time structure? Thanks for your time, BJ
"Brendan Jurd" <direvus@gmail.com> writes: > I'm not 100% clear on what the A_ prefix signifies ... is A_ArrayExpr > a good name for the parse-time structure? Yeah, might as well use that for consistency. The A_ doesn't seem very meaningful to me either, but I don't want to rename the existing examples ... regards, tom lane
So far I've only considered the '::' cast syntax suggested in the original proposal, e.g.: ARRAY[]::text[] I wonder whether we are also interested in catching CAST(), e.g.: CAST(ARRAY[] AS text[]) I'm personally okay with leaving it at support for '::', but admittedly I am heavily biased towards this syntax (I find CAST very ugly). I suppose supporting CAST as well would be the more predictable behaviour; I think people might be surprised if we supported one form of casting but not the other. Comments? Regards, BJ
"Brendan Jurd" <direvus@gmail.com> writes: > So far I've only considered the '::' cast syntax suggested in the > original proposal, e.g.: > ARRAY[]::text[] > I wonder whether we are also interested in catching CAST(), e.g.: > CAST(ARRAY[] AS text[]) I think you'll find that it's just about impossible to not handle both, because they look the same after the grammar gets done. regards, tom lane
On Nov 28, 2007 2:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wonder whether we are also interested in catching CAST(), e.g.: > > > CAST(ARRAY[] AS text[]) > > I think you'll find that it's just about impossible to not handle both, > because they look the same after the grammar gets done. Thanks Tom ... your comment makes me suspect I've been barking up the wrong tree. My original intent was to modify the grammar rules to catch an array expression followed by a typecast, and put the target typename of the cast directly into the A_ArrayExpr struct. That notion came from looking at the way that TypeName gets put into A_Const -- makeStringConst() takes an optional TypeName argument. Looking at the code in the context of your comment, that was probably a bad approach. I may've taken the A_Const analogy too far. Now I'm thinking I leave the grammar rules alone (apart from making it legal to specify an empty list of elements), and instead push the typename down into the child node from makeTypeCast(), if the child is an A_ArrayExpr. Does that work better? Regards, BJ
"Brendan Jurd" <direvus@gmail.com> writes: > Now I'm thinking I leave the grammar rules alone (apart from making it > legal to specify an empty list of elements), and instead push the > typename down into the child node from makeTypeCast(), if the child is > an A_ArrayExpr. Does that work better? Actually, if you do that you might as well forego the separate node type (which requires a nontrivial amount of infrastructure). I think it would work just about as well to have transformExpr check whether the argument of a TypeCast is an ArrayExpr, and if so call transformArrayExpr directly from there, passing the TypeName as an additional argument. Kinda ugly, but not really any worse than the way A_Const is handled in that same routine. (In fact, we could use the same technique to get rid of the typename field in A_Const ... might be worth doing?) regards, tom lane
On Nov 28, 2007 4:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Brendan Jurd" <direvus@gmail.com> writes: > > Now I'm thinking I leave the grammar rules alone (apart from making it > > legal to specify an empty list of elements), and instead push the > > typename down into the child node from makeTypeCast(), if the child is > > an A_ArrayExpr. Does that work better? > > Actually, if you do that you might as well forego the separate node type > (which requires a nontrivial amount of infrastructure). I think it > would work just about as well to have transformExpr check whether the > argument of a TypeCast is an ArrayExpr, and if so call > transformArrayExpr directly from there, passing the TypeName as an > additional argument. I actually thought that A_ArrayExpr would be a good addition even if you ignore the matter of typecasting. It always seemed weird to me that the parser generates an ArrayExpr directly. ArrayExpr has a bunch of members that are only set by the transform; all the parser does is set the 'elements' member. And then the transform creates a brand new ArrayExpr and populates it based on what's in the 'elements' member of the otherwise-empty ArrayExpr passed to it. So my feeling is that an A_ArrayExpr is a better fit for the parser output than ArrayExpr, and more in keeping with how the rest of the code does things. Mind you I'm also okay with your suggestion to let transformExpr take care of it. But I'm not adverse to putting in the legwork to set up the infrastructure for A_ArrayExpr, if it's a nice outcome. > Kinda ugly, but not really any worse than the way > A_Const is handled in that same routine. (In fact, we could use the > same technique to get rid of the typename field in A_Const ... might > be worth doing?) I had a bit of a dig into this. A_Const->typename gets set directly by the parse paths for "INTERVAL [(int)] string [interval range]". In fact, as far as I can tell that's the _only_ place A_Const->typename gets used at all. And all the transform does with that piece of information is treat the node like a typecast. I'm not seeing a huge amount of value in this special treatment. Why not just have the parser build this as an A_Const inside a TypeCast and then let the transform deal with it in the usual way? I found the following comment at parsenodes.h:244 * NOTE: for mostly historical reasons, A_Const parsenodes contain * room for a TypeName; we only generate a separate TypeCast node if the * argument to be casted is not a constant. In theory either representation * would work, but the combined representation saves a bit of code in many * productions in gram.y. However, this is no longer the case. makeTypeCast() doesn't care about whether its argument is a constant anymore: * Earlier we would determine whether an A_Const would * be acceptable, however Domains require coerce_type() * to process them -- applying constraints as required. And in "many productions in gram.y", "many" == 2. Currently the combined representation requires more code than it saves. So, I get the impression the use-case for A_Const->typename has become extinct. I think it could be removed with a minimum of fuss, and I'd be happy to include same with my patch (or, submit it as a separate patch; let me know your preference). Regards, BJ
"Brendan Jurd" <direvus@gmail.com> writes: > I actually thought that A_ArrayExpr would be a good addition even if > you ignore the matter of typecasting. It always seemed weird to me > that the parser generates an ArrayExpr directly. ArrayExpr has a > bunch of members that are only set by the transform; all the parser > does is set the 'elements' member. Well, that's a reasonable argument. And now that I think about it, a parser-only node type doesn't have nearly the support overhead that a full-fledged executable node does. So no objection to A_ArrayExpr if you want to do that. > I had a bit of a dig into this. A_Const->typename gets set directly > by the parse paths for "INTERVAL [(int)] string [interval range]". In > fact, as far as I can tell that's the _only_ place A_Const->typename > gets used at all. Uh, you missed quite a lot of others ... see CURRENT_DATE and a lot of other productions. regards, tom lane
On Nov 28, 2007 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I had a bit of a dig into this. A_Const->typename gets set directly > > by the parse paths for "INTERVAL [(int)] string [interval range]". In > > fact, as far as I can tell that's the _only_ place A_Const->typename > > gets used at all. > > Uh, you missed quite a lot of others ... see CURRENT_DATE and a lot of > other productions. > Thanks again. I missed those because they don't use makeStringConst(). Looking again, it turns out "many productions" is more like 15. That's a bigger number, certainly, but it's still manageable. It wouldn't be hard to convert them to generate a const-in-a-cast. In fact with the addition of a makeCastStringConst(), I think the code saving from A_Const->typename would be cancelled out. If the only reason for keeping A_Const->typename around is the alleged code saving (as indicated by the code comments), my offer to do away with it is still on the table. Regards, BJ
Brendan Jurd escribió: > If the only reason for keeping A_Const->typename around is the alleged > code saving (as indicated by the code comments), my offer to do away > with it is still on the table. Code cleanup is always welcome. -- Alvaro Herrera Developer, http://www.PostgreSQL.org/ "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake)
Hi folks, The patch is coming along nicely now. I do have a couple of questions about the implementation in transformArrayExpr though. ---- 1) How should we determine whether the array is multidimensional if we know the type in advance? Currently, transformArrayExpr uses the results of its search for a common element type to figure out whether the array is multidimensional. If we know the type in advance, we don't need to do the common type search (a nice side-effect), so we need some other way of figuring out how to set ArrayExpr->multidims on the new node. I could just check the nodeTag of the elements as they are transformed, but I'm concerned that the existing code might be relying on select_common_type to catch stupid input, like a mixture of scalar and array elements. If that's the case it might be unwise to bypass select_common_type or, at least, I'd need to come up with something else to provide the same level of sanity assurance in both code paths. ---- 2) Should the typecast propagate downwards into nested array elements? If we have a nested array written as, say, ARRAY[ARRAY[1, 2], ARRAY[3, 4], ARRAY[5, 6]]::float[], should we treat the inner arrays the same way as the outer array (with the advance knowledge that the array type should be float[])? If I'm reading the code correctly, the end result should be much the same, because the inner arrays will end up being coerced to float[] anyway. But shortcutting the coercion could save some cycles. Comments? Regards, BJ
On Fri, Nov 30, 2007 at 06:13:20AM +1100, Brendan Jurd wrote: > Hi folks, > > The patch is coming along nicely now. I do have a couple of questions > about the implementation in transformArrayExpr though. Awesome. > 1) How should we determine whether the array is multidimensional if we > know the type in advance? Well, given the array should be regular you should be able to just look at the first element, if it's a array look at it's first element, etc to determine the dimensions. This'll be fairly quick. > 2) Should the typecast propagate downwards into nested array elements? IMHO yes, you have th einfo you may as well use it. > If we have a nested array written as, say, ARRAY[ARRAY[1, 2], ARRAY[3, > 4], ARRAY[5, 6]]::float[], should we treat the inner arrays the same > way as the outer array (with the advance knowledge that the array type > should be float[])? TBH, I think you're going to have to go through the whole array to coerce them and check, so you may as well determine the dimensions at the same time. In general I think it's better to mark the type up front. In don't know if you should actually do the conversion straight away, but at least you don't need to guess the type anymore. Hope this helps, Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
Martijn van Oosterhout <kleptog@svana.org> writes: >> 1) How should we determine whether the array is multidimensional if we >> know the type in advance? > Well, given the array should be regular you should be able to just look > at the first element, if it's a array look at it's first element, etc > to determine the dimensions. This'll be fairly quick. How does that work with non-constant array constructor members? regards, tom lane
As discussed on -hackers, this patch allows the construction of an empty array if an explicit cast to an array type is given (as in, ARRAY[]::int[]). postgres=# select array[]::int[]; array ------- {} postgres=# select array[]; ERROR: no target type for empty array HINT: Empty arrays must be explictly cast to the desired array type, e.g. ARRAY[]::int[] A few notes on the implementation: * The syntax now allows an ARRAY constructor with an empty expression list (array_expr_list may be empty). * I've added a new parsenode for arrays, A_ArrayExpr (previously the parser would create ArrayExpr primnodes). * transformArrayExpr() now takes two extra arguments, a type oid and a typmod. When transforming a typecast which casts an A_ArrayExpr to an array type, transformExpr passes these type details down to transformArrayExpr, and skips the typecast. * transformArrayExpr() behaves slightly differently when passed type information. The overall type of the array is set to the given type, and all elements are explictly coerced to the equivalent element type. If it was not passed a type, then the behaviour is as previous; the function looks for a common type among the elements, and coerces them to that type. The overall type of the array is derived from the common element type. The patch is very invasive (at least compared to any of my previous patches), but so far I haven't managed to find any broken behaviour. All regression tests pass, and the regression tests for arrays seem to be quite comprehensive. I did add a couple of new tests for the empty array behaviours, but the rest I've left alone. I look forward to your comments -- although given the length of the 8.4 patch review queue, that will probably be an exercise in extreme patience! Major thanks go out to Tom for all his guidance on -hackers while I developed the patch. Regards, BJ
Вложения
"Brendan Jurd" <direvus@gmail.com> writes: > The patch is very invasive (at least compared to any of my previous > patches), but so far I haven't managed to find any broken behaviour. I'm sorry to suggest anything at this point, but... would it be less invasive if instead of requiring the immediate cast you created a special case in the array code to allow a placeholder object for "empty array of unknown type". The only operation which would be allowed on it would be to cast it to some specific array type. That way things like UPDATE foo SET col = array[]; INSERT INTO foo (col) VALUES (array[]); could be allowed if they could be contrived to introduce an assignment cast. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
On Nov 30, 2007 9:09 PM, Gregory Stark <stark@enterprisedb.com> wrote: > I'm sorry to suggest anything at this point, but... would it be less invasive > if instead of requiring the immediate cast you created a special case in the > array code to allow a placeholder object for "empty array of unknown type". > The only operation which would be allowed on it would be to cast it to some > specific array type. > > That way things like > > UPDATE foo SET col = array[]; > INSERT INTO foo (col) VALUES (array[]); > > could be allowed if they could be contrived to introduce an assignment cast. Hi Gregory. Not sure it would be less invasive, but I do like the outcome of being able to create an empty array pending assignment. In addition to your examples, it might also make it possible to do things like this in plpgsql DECLARE a text[] := array[]; Whereas my patch requires you to write a text[]: =array[]::text[]; ... which seems pretty stupid. So, I like your idea a lot from a usability point of view. But I really, really hate it from a "just spent half a week on this patch" point of view =/ Any suggestions about how you would enforce the "only allow casts to array types" restriction on the empty array? Cheers BJ
A quick recap: I submitted a patch for empty ARRAY[] syntax back in November, and as far as I can see it never made it to the patches list. Gregory suggested a different way of approaching the problem (quoted below), but nobody commented further about how it might be made to work. I'd like to RFC again on Gregory's idea, and if that doesn't bear any fruit I'd like to submit the patch as-is for review. Regards, BJ On 01/12/2007, Brendan Jurd <direvus@gmail.com> wrote: > On Nov 30, 2007 9:09 PM, Gregory Stark <stark@enterprisedb.com> wrote: > > I'm sorry to suggest anything at this point, but... would it be less invasive > > if instead of requiring the immediate cast you created a special case in the > > array code to allow a placeholder object for "empty array of unknown type". > > The only operation which would be allowed on it would be to cast it to some > > specific array type. > > > > That way things like > > > > UPDATE foo SET col = array[]; > > INSERT INTO foo (col) VALUES (array[]); > > > > could be allowed if they could be contrived to introduce an assignment cast. > > Not sure it would be less invasive, but I do like the outcome of being > able to create an empty array pending assignment. In addition to your > examples, it might also make it possible to do things like this in > plpgsql > > DECLARE > a text[] := array[]; > > Whereas my patch requires you to write > > a text[]: =array[]::text[]; > > ... which seems pretty stupid. > ... > Any suggestions about how you would enforce the "only allow casts to > array types" restriction on the empty array? >
"Brendan Jurd" <direvus@gmail.com> writes: > A quick recap: I submitted a patch for empty ARRAY[] syntax back in > November, and as far as I can see it never made it to the patches > list. Gregory suggested a different way of approaching the problem > (quoted below), but nobody commented further about how it might be > made to work. > I'd like to RFC again on Gregory's idea, and if that doesn't bear any > fruit I'd like to submit the patch as-is for review. Greg's idea is basically to invent array-of-UNKNOWN as a genuine datatype, which as I stated way back when seems fairly dangerous to me. UNKNOWN is already a pretty slippery animal, and I don't know what cast paths we might open up by doing that. I think the require-a-cast solution is a lot less likely to result in unforeseen side-effects. >> Whereas my patch requires you to write >> a text[]: =array[]::text[]; >> ... which seems pretty stupid. In practice you'd write DECLARE a text[] := '{}'; which is even shorter, so I don't find this convincing. regards, tom lane
"Brendan Jurd" <direvus@gmail.com> writes: > As discussed on -hackers, this patch allows the construction of an > empty array if an explicit cast to an array type is given (as in, > ARRAY[]::int[]). Applied with minor fixes; mostly, ensuring that the cast action would propagate down to sub-arrays, as in regression=# select array[[1],[2.2]]::int[]; array ----------- {{1},{2}} (1 row) I was interested to realize that this fix validates the decision to pass down the type information on-the-fly during transformExpr recursion. It would have been a lot more painful to do it if we'd taken the A_Const approach. I didn't do anything about removing A_Const's typename field, but I'm thinking that would be a good cleanup patch. regards, tom lane
On 21/03/2008, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Brendan Jurd" <direvus@gmail.com> writes: > > > As discussed on -hackers, this patch allows the construction of an > > empty array if an explicit cast to an array type is given (as in, > > ARRAY[]::int[]). > > > Applied with minor fixes; mostly, ensuring that the cast action would > propagate down to sub-arrays, as in Great, thanks Tom. > I was interested to realize that this fix validates the decision to > pass down the type information on-the-fly during transformExpr recursion. > It would have been a lot more painful to do it if we'd taken the A_Const > approach. > Indeed. > I didn't do anything about removing A_Const's typename field, but I'm > thinking that would be a good cleanup patch. > I'd be happy to take this on. My day job is pretty busy at the moment but I should be able to submit something in a week or so. Cheers, BJ