Обсуждение: json api WIP patch
Here is a patch for the first part of the JSON API that was recently discussed. It includes the json parser hook infrastructure and functions for json_get and friends, plus json_keys. As is, this exposes the json lexer fully for use by the hook functions. But I could easily be persuaded that this should be an opaque structure with some constructor and getter functions - I don't think the hook functions need or should be able to set anything in the lexer. Work is proceeding on some of the more advanced functionality discussed. cheers andrew
Вложения
On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Here is a patch for the first part of the JSON API that was recently > discussed. It includes the json parser hook infrastructure and functions > for json_get and friends, plus json_keys. > > As is, this exposes the json lexer fully for use by the hook functions. But > I could easily be persuaded that this should be an opaque structure with > some constructor and getter functions - I don't think the hook functions > need or should be able to set anything in the lexer. > > Work is proceeding on some of the more advanced functionality discussed. This seems to contain a large number of spurious whitespace changes. And maybe some other spurious changes. For example, I'm not sure why this comment is truncated: } } ! /* ! * Check for trailing garbage. As in json_lex(), any alphanumeric stuff ! * here should be considered part of the token for error-reporting ! * purposes. ! */ for (p = s; JSON_ALPHANUMERIC_CHAR(*p); p++) error = true; lex->token_terminator = p; if (error) report_invalid_token(lex); --- 730,739 ---- } } ! /* Check for trailing garbage. */ for (p = s; JSON_ALPHANUMERIC_CHAR(*p); p++) error = true; + lex->prev_token_terminator = lex->token_terminator; lex->token_terminator = p; if (error) report_invalid_token(lex); -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 01/02/2013 04:45 PM, Robert Haas wrote: > On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> Here is a patch for the first part of the JSON API that was recently >> discussed. It includes the json parser hook infrastructure and functions >> for json_get and friends, plus json_keys. >> >> As is, this exposes the json lexer fully for use by the hook functions. But >> I could easily be persuaded that this should be an opaque structure with >> some constructor and getter functions - I don't think the hook functions >> need or should be able to set anything in the lexer. >> >> Work is proceeding on some of the more advanced functionality discussed. > This seems to contain a large number of spurious whitespace changes. I'm glad you're looking at it :-) I did do a run of pgindent on the changed files before I cut the patch, which might have made some of those changes. I notice a couple of other infelicities too, which are undoubtedly my fault. The original prototype of this was cut against some older code, and I then tried to merge it with the current code base, and make sure that all the regression tests passed. That might well have resulted in a number of things that need review. > > And maybe some other spurious changes. For example, I'm not sure why > this comment is truncated: > > Another good question. I'll make another pass over this and try to remove some of what's annoying you. cheers andrew
On 01/02/2013 05:51 PM, Andrew Dunstan wrote: > > On 01/02/2013 04:45 PM, Robert Haas wrote: >> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net> >> wrote: >>> Here is a patch for the first part of the JSON API that was recently >>> discussed. It includes the json parser hook infrastructure and >>> functions >>> for json_get and friends, plus json_keys. Udated patch that contains most of the functionality I'm after. One piece left is populate_recordset (populate a set of records from a single json datum which is an array of objects, in one pass). That requires a bit of thought. I hope most of the whitespace issues are fixed. cheers andrew
Вложения
Hello 2013/1/4 Andrew Dunstan <andrew@dunslane.net>: > > On 01/02/2013 05:51 PM, Andrew Dunstan wrote: >> >> >> On 01/02/2013 04:45 PM, Robert Haas wrote: >>> >>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net> >>> wrote: >>>> >>>> Here is a patch for the first part of the JSON API that was recently >>>> discussed. It includes the json parser hook infrastructure and >>>> functions >>>> for json_get and friends, plus json_keys. > > > > Udated patch that contains most of the functionality I'm after. One piece > left is populate_recordset (populate a set of records from a single json > datum which is an array of objects, in one pass). That requires a bit of > thought. > > I hope most of the whitespace issues are fixed. > it is looking well I have one note - is it "json_each" good name? Regards Pavel > cheers > > andrew > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On 01/04/2013 03:36 PM, Pavel Stehule wrote: > Hello > > > > 2013/1/4 Andrew Dunstan <andrew@dunslane.net>: >> On 01/02/2013 05:51 PM, Andrew Dunstan wrote: >>> >>> On 01/02/2013 04:45 PM, Robert Haas wrote: >>>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net> >>>> wrote: >>>>> Here is a patch for the first part of the JSON API that was recently >>>>> discussed. It includes the json parser hook infrastructure and >>>>> functions >>>>> for json_get and friends, plus json_keys. >> >> >> Udated patch that contains most of the functionality I'm after. One piece >> left is populate_recordset (populate a set of records from a single json >> datum which is an array of objects, in one pass). That requires a bit of >> thought. >> >> I hope most of the whitespace issues are fixed. >> > it is looking well > > I have one note - is it "json_each" good name? > Possibly not, although hstore has each(). json_unnest might be even less felicitous. I'm expecting a good deal of bikeshedding - I'm trying to ignore those issues for the most part because the functionality seems much more important to me than the names. The more people that pound on it and try to break it the happier I'll be. cheers andrew
2013/1/4 Andrew Dunstan <andrew@dunslane.net>: > > On 01/04/2013 03:36 PM, Pavel Stehule wrote: >> >> Hello >> >> >> >> 2013/1/4 Andrew Dunstan <andrew@dunslane.net>: >>> >>> On 01/02/2013 05:51 PM, Andrew Dunstan wrote: >>>> >>>> >>>> On 01/02/2013 04:45 PM, Robert Haas wrote: >>>>> >>>>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net> >>>>> wrote: >>>>>> >>>>>> Here is a patch for the first part of the JSON API that was recently >>>>>> discussed. It includes the json parser hook infrastructure and >>>>>> functions >>>>>> for json_get and friends, plus json_keys. >>> >>> >>> >>> Udated patch that contains most of the functionality I'm after. One piece >>> left is populate_recordset (populate a set of records from a single json >>> datum which is an array of objects, in one pass). That requires a bit of >>> thought. >>> >>> I hope most of the whitespace issues are fixed. >>> >> it is looking well >> >> I have one note - is it "json_each" good name? >> > > > Possibly not, although hstore has each(). json_unnest might be even less > felicitous. I understand - but hstore isn't in core - so it should not be precedent regexp_split_to_table I am not native speaker, it sounds little bit strange - but maybe because I am not native speaker :) Regards Pavel > > I'm expecting a good deal of bikeshedding - I'm trying to ignore those > issues for the most part because the functionality seems much more important > to me than the names. > > The more people that pound on it and try to break it the happier I'll be. > > cheers > > andrew
On Fri, Jan 4, 2013 at 3:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I understand - but hstore isn't in core - so it should not be precedent > > regexp_split_to_table > > I am not native speaker, it sounds little bit strange - but maybe > because I am not native speaker :) it's common usage: http://api.jquery.com/jQuery.each/ the patch looks fabulous. There are a few trivial whitespace issues yet and I noticed a leaked hstore comment@ 2440: + /* + * if the input hstore is empty, we can only skip the rest if we were + * passed in a non-null record, since otherwise there may be issues with + * domain nulls. + */ merlin
2013/1/7 Merlin Moncure <mmoncure@gmail.com>: > On Fri, Jan 4, 2013 at 3:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I understand - but hstore isn't in core - so it should not be precedent >> >> regexp_split_to_table >> >> I am not native speaker, it sounds little bit strange - but maybe >> because I am not native speaker :) > > it's common usage: http://api.jquery.com/jQuery.each/ > ook Regards Pavel > the patch looks fabulous. There are a few trivial whitespace issues > yet and I noticed a leaked hstore comment@ 2440: > + /* > + * if the input hstore is empty, we can only skip the rest if we were > + * passed in a non-null record, since otherwise there may be issues with > + * domain nulls. > + */ > > merlin
On 01/07/2013 10:25 AM, Merlin Moncure wrote: > > the patch looks fabulous. There are a few trivial whitespace issues > yet and I noticed a leaked hstore comment@ 2440: > + /* > + * if the input hstore is empty, we can only skip the rest if we were > + * passed in a non-null record, since otherwise there may be issues with > + * domain nulls. > + */ > Here is a patch that has all the functionality I'm intending to provide. The API is improved some, with the parser now keeping track of the nesting levels instead of callers having to do so, and a constructor function provided for JsonLexContext objects. The processing functions have been extended to provide populate_record() and populate_recordset() functions.The latter in particular could be useful in decomposing a piece of json representing an array of flat objects (a fairly common pattern) into a set of Postgres records in a single pass. The main thing I'm going to concentrate on now is making sure that this doesn't leak memory. I'm sure there's some tightening required in that area. Any eyeballs there will be greatly appreciated. There are also a couple of very minor things to clean up. You (Merlin) have kindly volunteered to work on documentation, so before we go too far with that any bikeshedding on names, or on the functionality being provided, should now take place. cheers andrew
Вложения
> The processing functions have been extended to provide populate_record() and populate_recordset() functions.The latterin particular could be useful in decomposing a piece of json representing an array of flat objects (a fairly commonpattern) into a set of Postgres records in a single pass. So this would allow an 'insert into ... select ... from <unpack-the-JSON>(...)'? I had been wondering how to do such an insertion efficiently in the context of SPI, but it seems that there is no SPI_copy equiv that would allow a query parse and plan to be avoided. Is this mechanism likely to be as fast as we can get at the moment in contexts where copy is not feasible?
On 01/08/2013 01:45 AM, james wrote: >> The processing functions have been extended to provide >> populate_record() and populate_recordset() functions.The latter in >> particular could be useful in decomposing a piece of json >> representing an array of flat objects (a fairly common pattern) into >> a set of Postgres records in a single pass. > > So this would allow an 'insert into ... select ... from > <unpack-the-JSON>(...)'? Yes. > > I had been wondering how to do such an insertion efficiently in the > context of SPI, but it seems that there is no SPI_copy equiv that > would allow a query parse and plan to be avoided. Your query above would need to be planned too, although the plan will be trivial. > > Is this mechanism likely to be as fast as we can get at the moment in > contexts where copy is not feasible? > You should not try to use it as a general bulk load facility. And it will not be as fast as COPY for several reasons, including that the Json parsing routines are necessarily much heavier than the COPY parse routines, which have in any case been optimized over quite a long period. Also, a single json datum is limited to no more than 1Gb. If you have such a datum, parsing it involves having it in memory and then taking a copy (I wonder if we could avoid that step - will take a look). Then each object is decomposed into a hash table of key value pairs, which it then used to construct the record datum. Each field name in the result record is used to look up the value in the hash table - this happens once in the case of populate_record() and once per object in the array in the case of populate_recordset(). In the latter case the resulting records are put into a tuplestore structure (which spills to disk if necessary) which is then returned to the caller when all the objects in the json array are processed. COPY doesn't have these sorts of issues. It knows without having to look things up where each datum is in each record, and it stashes the result straight into the target table. It can read and insert huge numbers of rows without significant memory implications. Both these routines and COPY in non-binary mode use the data type input routines to convert text values. In some cases (very notably timestamps) these routines can easily be shown to be fantastically expensive compared to binary input. This is part of what has led to the creation of utilities like pg_bulkload. Perhaps if you give us a higher level view of what you're trying to achieve we can help you better. cheers andrew
>> >> I had been wondering how to do such an insertion efficiently in the context of SPI, but it seems that there is no SPI_copyequiv that would allow a query parse and plan to be avoided. > > Your query above would need to be planned too, although the plan will be trivial. Ah yes, I meant that I had not found a way to avoid it (for multi-row inserts etc) from a stored proc context where I have SPI functions available. > You should not try to use it as a general bulk load facility. And it will not be as fast as COPY for several reasons, includingthat the Json parsing routines are necessarily much heavier than the COPY parse routines, which have in any casebeen optimized over quite a long period. Also, a single json datum is limited to no more than 1Gb. If you have such adatum, parsing it involves having it in memory and then taking a copy (I wonder if we could avoid that step - will takea look). Then each object is decomposed into a hash table of key value pairs, which it then used to construct the recorddatum. Each field name in the result record is used to look up the value in the hash table - this happens once inthe case of populate_record() and once per object in the array in the case of populate_recordset(). In the latter casethe resulting records are put into a tuplestore structure (which spills to disk if necessary) which is then returnedto the caller when all the objects in the js on array are processed. COPY doesn't have these sorts of issues. It knows without having to look things up where each datumis in each record, and it stashes the result straight into the target table. It can read and insert huge numbers ofrows without significant memory implications. Yes - but I don't think I can use COPY from a stored proc context can I? If I could use binary COPY from a stored proc thathas received a binary param and unpacked to the data, it would be handy. If SPI provided a way to perform a copy to a temp table and then some callback on an iterator that yields rows to it, that would do the trick I guess. > Perhaps if you give us a higher level view of what you're trying to achieve we can help you better. I had been trying to identify a way to work with record sets where the records might be used for insert, or for updates or deletion statements, preferably without forming a large custom SQL statement that must then be parsed and planned (and which would be a PITA if I wanted to use the SQL-C preprocessor or some language bindings that like to prepare a statement and execute with params). The data I work with has a master-detail structure and insertion performance matters, so I'm trying to limit manipulations to one statement per table per logical operation even where there are multiple detail rows. Sometimes the network latency can be a pain too and that also suggests an RPC with unpack and insert locally. Cheers James
On 01/08/2013 09:58 AM, Andrew Dunstan wrote: > > If you have such a datum, parsing it involves having it in memory and > then taking a copy (I wonder if we could avoid that step - will take a > look). Here is a Proof Of Concept patch against my development tip on what's involved in getting the JSON lexer not to need a nul-terminated string to parse. This passes regression, incidentally. The downside is that processing is very slightly more complex, and that json_in() would need to call strlen() on its input. The upside would be that the processing routines I've been working on would no longer need to create copies of their json arguments using text_to_cstring() just so they can get a null-terminated string to process. Consequent changes would modify the signature of makeJsonLexContext() so it's first argument would be a text* instead of a char* (and of course its logic would change accordingly). I could go either way. Thoughts? cheers andrew
On 01/08/2013 03:12 PM, Andrew Dunstan wrote: > > On 01/08/2013 09:58 AM, Andrew Dunstan wrote: >> >> If you have such a datum, parsing it involves having it in memory and >> then taking a copy (I wonder if we could avoid that step - will take >> a look). > > > Here is a Proof Of Concept patch against my development tip on what's > involved in getting the JSON lexer not to need a nul-terminated string > to parse. This passes regression, incidentally. The downside is that > processing is very slightly more complex, and that json_in() would > need to call strlen() on its input. The upside would be that the > processing routines I've been working on would no longer need to > create copies of their json arguments using text_to_cstring() just so > they can get a null-terminated string to process. > > Consequent changes would modify the signature of makeJsonLexContext() > so it's first argument would be a text* instead of a char* (and of > course its logic would change accordingly). > > I could go either way. Thoughts? > > this time with patch ...
Вложения
On 01/08/2013 03:07 PM, james wrote: > > > Yes - but I don't think I can use COPY from a stored proc context can > I? If I could use binary COPY from a stored proc that has received a > binary param and unpacked to the data, it would be handy. > You can use COPY from a stored procedure, but only to and from files. > If SPI provided a way to perform a copy to a temp table and then some > callback on an iterator that yields rows to it, that would do the > trick I guess. SPI is useful, but it's certainly possible to avoid its use. After all, that what almost the whole backend does, including the COPY code. Of course, it's a lot harder to write that way, which is part of why SPI exists. Efficiency has its price. cheers andrew
On 1/7/13 5:15 PM, Andrew Dunstan wrote: > You (Merlin) have kindly volunteered to work on documentation, so before > we go too far with that any bikeshedding on names, or on the > functionality being provided, should now take place. Hmm, I was going to say, this patch contains no documentation, so I have no idea what it is supposed to do. "Recently discussed" isn't a good substitute for describing what the patch is supposed to accomplish.
> You can use COPY from a stored procedure, but only to and from files. I think that's in the chocolate fireguard realm though as far as efficiency for this sort of scenario goes, even if its handled by retaining an mmap'd file as workspace. > >> If SPI provided a way to perform a copy to a temp table and then some callback on an iterator that yields rows to it,that would do the trick I guess. > > SPI is useful, but it's certainly possible to avoid its use. After all, that what almost the whole backend does, includingthe COPY code. Of course, it's a lot harder to write that way, which is part of why SPI exists. Efficiency has itsprice. So it is possible to use a lower level interface from a C stored proc? SPI is the (only) documented direct function extension API isn't it? Is the issue with using the JSON data-to-record set that the parsing can be costly? Perhaps it can be achieved with B64 of compressed protobuf, or such. I don't mind if it seems a bit messy - the code can be generated from the table easily enough, especially if I can use C++. I guess an allocator that uses SPI_palloc would solve issues with memory management on error?
On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 1/7/13 5:15 PM, Andrew Dunstan wrote: >> You (Merlin) have kindly volunteered to work on documentation, so before >> we go too far with that any bikeshedding on names, or on the >> functionality being provided, should now take place. > > Hmm, I was going to say, this patch contains no documentation, so I have > no idea what it is supposed to do. "Recently discussed" isn't a good > substitute for describing what the patch is supposed to accomplish. Why not? There are functional examples in the docs and the purpose of the various functions was hashed out pretty well a couple weeks back, deficiencies corrected, etc. reference: http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html merlin
On 01/08/2013 04:32 PM, Merlin Moncure wrote: > On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentraut<peter_e@gmx.net> wrote: >> On 1/7/13 5:15 PM, Andrew Dunstan wrote: >>> You (Merlin) have kindly volunteered to work on documentation, so before >>> we go too far with that any bikeshedding on names, or on the >>> functionality being provided, should now take place. >> Hmm, I was going to say, this patch contains no documentation, so I have >> no idea what it is supposed to do. "Recently discussed" isn't a good >> substitute for describing what the patch is supposed to accomplish. > Why not? There are functional examples in the docs and the purpose of > the various functions was hashed out pretty well a couple weeks back, > deficiencies corrected, etc. > > reference:http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html Well, at a high level the patch is meant to do two things: provide an API that can be used to build JSON processing functions easily, and provide some basic json processing functions built on the API. Those functions provide similar capabilities to the accessor functions that hstore has. Perhaps also this will help. Here is the list of functions and operators as currently implemented. I also have working operators for the get_path functions which will be in a future patch. All these are used in the included regression tests. Name | Result data type | Argument data types -------------------------+------------------+------------------------------------------------------------------------ json_array_length | integer | json json_each | SETOF record | from_json json, OUT key text, OUT value json json_each_as_text | SETOF record | from_json json, OUT key text, OUT value text json_get | json | json, integer json_get | json | json, text json_get_as_text | text | json, integer json_get_as_text | text | json, text json_get_path | json | from_json json, VARIADIC path_elems text[] json_get_path_as_text | text | from_json json, VARIADIC path_elems text[] json_object_keys | SETOF text | json json_populate_record | anyelement | anyelement, json json_populate_recordset | SETOF anyelement | base anyelement, from_json json, use_json_as_text boolean DEFAULT false json_unnest | SETOF json | from_json json, OUT value json Name | Left arg type | Right arg type | Result type | Description ------+---------------+----------------+-------------+-------------------------------- -> | json | integer | json | get json array element -> | json | text | json | get json object field ->> | json | integer | text | get json array element as text ->> | json | text | text | get json object field as text cheers andrew
On Mon, Jan 7, 2013 at 4:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > You (Merlin) have kindly volunteered to work on documentation, so before we > go too far with that any bikeshedding on names, or on the functionality > being provided, should now take place. Barring comment/complaint, I'm just going to roll with what we've got.It seems pretty good to me. merlin
On 01/04/2013 03:23 PM, Andrew Dunstan wrote: > > On 01/02/2013 05:51 PM, Andrew Dunstan wrote: >> >> On 01/02/2013 04:45 PM, Robert Haas wrote: >>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan >>> <andrew@dunslane.net> wrote: >>>> Here is a patch for the first part of the JSON API that was recently >>>> discussed. It includes the json parser hook infrastructure and >>>> functions >>>> for json_get and friends, plus json_keys. > > > Udated patch that contains most of the functionality I'm after. One > piece left is populate_recordset (populate a set of records from a > single json datum which is an array of objects, in one pass). That > requires a bit of thought. > > I hope most of the whitespace issues are fixed. > > This updated patch contains all the intended functionality, including operators for the json_get_path functions, so you can say things like select jsonval->array['f1','0','f2] ... It also removes any requirement to copy the json value before setting up the lexer by removing the lexer's requirement to have a nul terminated string. Instead the lexer is told the input length and relies on that. For this reason, json_in() now calls cstring_get_text() before rather than after calling the validation routine, but that's really not something worth bothering about. A couple of points worth noting: it's a pity that we have to run CREATE OR REPLACE FUNCTION in system_views.sql in order to set up default values for builtin functions. That feels very kludgy. Also, making operators for variadic functions is a bit of a pain. I had to set up non-variadic version of the same functions (see json_get_path_op and json_get_path_as_text_op) just so I could set up the operators. Neither of these are exactly showstopper items, just mild annoyances. I will continue hunting memory leaks, but when Merlin gets done with docco I think we'll be far enough advanced to add this to the commitfest. cheers andrew
Вложения
On Thu, Jan 10, 2013 at 6:42 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> Udated patch that contains most of the functionality I'm after. One piece >> left is populate_recordset (populate a set of records from a single json >> datum which is an array of objects, in one pass). That requires a bit of >> thought. >> >> I hope most of the whitespace issues are fixed. > > > This updated patch contains all the intended functionality, including > operators for the json_get_path functions, so you can say things like > > select jsonval->array['f1','0','f2] ... > > It also removes any requirement to copy the json value before setting up the > lexer by removing the lexer's requirement to have a nul terminated string. > Instead the lexer is told the input length and relies on that. For this > reason, json_in() now calls cstring_get_text() before rather than after > calling the validation routine, but that's really not something worth > bothering about. > > A couple of points worth noting: it's a pity that we have to run CREATE OR > REPLACE FUNCTION in system_views.sql in order to set up default values for > builtin functions. That feels very kludgy. Also, making operators for > variadic functions is a bit of a pain. I had to set up non-variadic version > of the same functions (see json_get_path_op and json_get_path_as_text_op) > just so I could set up the operators. Neither of these are exactly > showstopper items, just mild annoyances. > > I will continue hunting memory leaks, but when Merlin gets done with docco I > think we'll be far enough advanced to add this to the commitfest. So, how much performance does this lose on json_in() on a large cstring, as compared with master? I can't shake the feeling that this is adding a LOT of unnecessary data copying. For one thing, instead of copying every single lexeme (including the single-character ones?) out of the original object, we could just store a pointer to the offset where the object starts and a length, instead of copying it. This is also remarkably thin on comments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 01/14/2013 11:32 AM, Robert Haas wrote: > > So, how much performance does this lose on json_in() on a large > cstring, as compared with master? That's a good question. I'll try to devise a test. > > I can't shake the feeling that this is adding a LOT of unnecessary > data copying. For one thing, instead of copying every single lexeme > (including the single-character ones?) out of the original object, we > could just store a pointer to the offset where the object starts and a > length, instead of copying it. In the pure pares case (json_in, json_reccv) nothing extra should be copied. On checking this after reading the above I found that wasn't quite the case, and some lexemes (scalars and field names, but not punctuation) were being copied when not needed. I have made a fix (see <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>) which I will include in the next version I publish. In the case of string lexemes, we are passing back a de-escaped version, so just handing back pointers to the beginning and end in the input string doesn't work. > > This is also remarkably thin on comments. Fair criticism. I'll work on that. Thanks for looking at this. cheers andrew
On Thu, Jan 10, 2013 at 5:42 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 01/04/2013 03:23 PM, Andrew Dunstan wrote: >> >> >> On 01/02/2013 05:51 PM, Andrew Dunstan wrote: >>> >>> >>> On 01/02/2013 04:45 PM, Robert Haas wrote: >>>> >>>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net> >>>> wrote: >>>>> >>>>> Here is a patch for the first part of the JSON API that was recently >>>>> discussed. It includes the json parser hook infrastructure and >>>>> functions >>>>> for json_get and friends, plus json_keys. >> >> >> >> Udated patch that contains most of the functionality I'm after. One piece >> left is populate_recordset (populate a set of records from a single json >> datum which is an array of objects, in one pass). That requires a bit of >> thought. >> >> I hope most of the whitespace issues are fixed. >> >> > > > This updated patch contains all the intended functionality, including > operators for the json_get_path functions, so you can say things like > > select jsonval->array['f1','0','f2] ... > > It also removes any requirement to copy the json value before setting up the > lexer by removing the lexer's requirement to have a nul terminated string. > Instead the lexer is told the input length and relies on that. For this > reason, json_in() now calls cstring_get_text() before rather than after > calling the validation routine, but that's really not something worth > bothering about. > > A couple of points worth noting: it's a pity that we have to run CREATE OR > REPLACE FUNCTION in system_views.sql in order to set up default values for > builtin functions. That feels very kludgy. Also, making operators for > variadic functions is a bit of a pain. I had to set up non-variadic version > of the same functions (see json_get_path_op and json_get_path_as_text_op) > just so I could set up the operators. Neither of these are exactly > showstopper items, just mild annoyances. > > I will continue hunting memory leaks, but when Merlin gets done with docco I > think we'll be far enough advanced to add this to the commitfest. While testing this I noticed that integer based 'get' routines are zero based -- was this intentional? Virtually all other aspects of SQL are 1 based: postgres=# select json_get('[1,2,3]', 1);json_get ----------2 (1 row) postgres=# select json_get('[1,2,3]', 0);json_get ----------1 (1 row) merlin
On 01/14/2013 07:36 PM, Merlin Moncure wrote: > While testing this I noticed that integer based 'get' routines are > zero based -- was this intentional? Virtually all other aspects of > SQL are 1 based: > > postgres=# select json_get('[1,2,3]', 1); > json_get > ---------- > 2 > (1 row) > > postgres=# select json_get('[1,2,3]', 0); > json_get > ---------- > 1 > (1 row) > Yes. it's intentional. SQL arrays might be 1-based by default, but JavaScript arrays are not. JsonPath and similar gadgets treat the arrays as zero-based. I suspect the Json-using community would not thank us for being overly SQL-centric on this - and I say that as someone who has always thought zero based arrays were a major design mistake, responsible for countless off-by-one errors. cheers andrew
On 01/14/2013 12:52 PM, Andrew Dunstan wrote: > > On 01/14/2013 11:32 AM, Robert Haas wrote: >> >> So, how much performance does this lose on json_in() on a large >> cstring, as compared with master? > > That's a good question. I'll try to devise a test. > >> >> I can't shake the feeling that this is adding a LOT of unnecessary >> data copying. For one thing, instead of copying every single lexeme >> (including the single-character ones?) out of the original object, we >> could just store a pointer to the offset where the object starts and a >> length, instead of copying it. > > In the pure pares case (json_in, json_reccv) nothing extra should be > copied. On checking this after reading the above I found that wasn't > quite the case, and some lexemes (scalars and field names, but not > punctuation) were being copied when not needed. I have made a fix (see > <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>) > which I will include in the next version I publish. > > In the case of string lexemes, we are passing back a de-escaped > version, so just handing back pointers to the beginning and end in the > input string doesn't work. After a couple of iterations, some performance enhancements to the json parser and lexer have ended up with a net performance improvement over git tip. On our test rig, the json parse test runs at just over 13s per 10000 parses on git tip and approx 12.55s per 10000 parses with the attached patch. Truth be told, I think the lexer changes have more than paid for the small cost of the switch to an RD parser. But since the result is a net performance win PLUS some enhanced functionality, I think we should be all good. cheers andrew
Вложения
On 01/14/2013 11:02 PM, Andrew Dunstan wrote: > > On 01/14/2013 12:52 PM, Andrew Dunstan wrote: >> >> On 01/14/2013 11:32 AM, Robert Haas wrote: >>> >>> So, how much performance does this lose on json_in() on a large >>> cstring, as compared with master? >> >> That's a good question. I'll try to devise a test. >> >>> >>> I can't shake the feeling that this is adding a LOT of unnecessary >>> data copying. For one thing, instead of copying every single lexeme >>> (including the single-character ones?) out of the original object, we >>> could just store a pointer to the offset where the object starts and a >>> length, instead of copying it. >> >> In the pure pares case (json_in, json_reccv) nothing extra should be >> copied. On checking this after reading the above I found that wasn't >> quite the case, and some lexemes (scalars and field names, but not >> punctuation) were being copied when not needed. I have made a fix >> (see >> <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>) >> which I will include in the next version I publish. >> >> In the case of string lexemes, we are passing back a de-escaped >> version, so just handing back pointers to the beginning and end in >> the input string doesn't work. > > > After a couple of iterations, some performance enhancements to the > json parser and lexer have ended up with a net performance improvement > over git tip. On our test rig, the json parse test runs at just over > 13s per 10000 parses on git tip and approx 12.55s per 10000 parses > with the attached patch. > > Truth be told, I think the lexer changes have more than paid for the > small cost of the switch to an RD parser. But since the result is a > net performance win PLUS some enhanced functionality, I think we > should be all good. > > Latest version of this patch, including some documentation, mainly from Merlin Moncure but tweaked by me. cheers andrew
Вложения
On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote: > > On 01/14/2013 07:36 PM, Merlin Moncure wrote: > >While testing this I noticed that integer based 'get' routines are > >zero based -- was this intentional? Virtually all other aspects of > >SQL are 1 based: > > > >postgres=# select json_get('[1,2,3]', 1); > > json_get > >---------- > > 2 > >(1 row) > > > >postgres=# select json_get('[1,2,3]', 0); > > json_get > >---------- > > 1 > >(1 row) > > Yes. it's intentional. SQL arrays might be 1-based by default, but > JavaScript arrays are not. JsonPath and similar gadgets treat the > arrays as zero-based. I suspect the Json-using community would not > thank us for being overly SQL-centric on this - and I say that as > someone who has always thought zero based arrays were a major design > mistake, responsible for countless off-by-one errors. Perhaps we could compromise by making arrays 0.5-based. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Jan 15, 2013 at 1:04 PM, David Fetter <david@fetter.org> wrote: > On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote: >> >> On 01/14/2013 07:36 PM, Merlin Moncure wrote: >> >While testing this I noticed that integer based 'get' routines are >> >zero based -- was this intentional? Virtually all other aspects of >> >SQL are 1 based: >> > >> >postgres=# select json_get('[1,2,3]', 1); >> > json_get >> >---------- >> > 2 >> >(1 row) >> > >> >postgres=# select json_get('[1,2,3]', 0); >> > json_get >> >---------- >> > 1 >> >(1 row) >> >> Yes. it's intentional. SQL arrays might be 1-based by default, but >> JavaScript arrays are not. JsonPath and similar gadgets treat the >> arrays as zero-based. I suspect the Json-using community would not >> thank us for being overly SQL-centric on this - and I say that as >> someone who has always thought zero based arrays were a major design >> mistake, responsible for countless off-by-one errors. > > Perhaps we could compromise by making arrays 0.5-based. Well, I'm not prepared to argue with Andrew in this one. It was surprising behavior to me, but that's sample size one. merlin
On 01/15/2013 02:47 PM, Merlin Moncure wrote: > On Tue, Jan 15, 2013 at 1:04 PM, David Fetter <david@fetter.org> wrote: >> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote: >>> On 01/14/2013 07:36 PM, Merlin Moncure wrote: >>>> While testing this I noticed that integer based 'get' routines are >>>> zero based -- was this intentional? Virtually all other aspects of >>>> SQL are 1 based: >>>> >>>> postgres=# select json_get('[1,2,3]', 1); >>>> json_get >>>> ---------- >>>> 2 >>>> (1 row) >>>> >>>> postgres=# select json_get('[1,2,3]', 0); >>>> json_get >>>> ---------- >>>> 1 >>>> (1 row) >>> Yes. it's intentional. SQL arrays might be 1-based by default, but >>> JavaScript arrays are not. JsonPath and similar gadgets treat the >>> arrays as zero-based. I suspect the Json-using community would not >>> thank us for being overly SQL-centric on this - and I say that as >>> someone who has always thought zero based arrays were a major design >>> mistake, responsible for countless off-by-one errors. >> Perhaps we could compromise by making arrays 0.5-based. > Well, I'm not prepared to argue with Andrew in this one. It was > surprising behavior to me, but that's sample size one. > I doubt I'm very representative either. People like David Wheeler, Taras Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than me. I'm quite prepared to change it if that's the consensus. cheers andrew
On Tue, Jan 15, 2013 at 12:17 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 01/15/2013 02:47 PM, Merlin Moncure wrote: >> >> On Tue, Jan 15, 2013 at 1:04 PM, David Fetter <david@fetter.org> wrote: >>> >>> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote: >>>> >>>> On 01/14/2013 07:36 PM, Merlin Moncure wrote: >>>>> >>>>> While testing this I noticed that integer based 'get' routines are >>>>> zero based -- was this intentional? Virtually all other aspects of >>>>> SQL are 1 based: >>>>> >>>>> postgres=# select json_get('[1,2,3]', 1); >>>>> json_get >>>>> ---------- >>>>> 2 >>>>> (1 row) >>>>> >>>>> postgres=# select json_get('[1,2,3]', 0); >>>>> json_get >>>>> ---------- >>>>> 1 >>>>> (1 row) >>>> >>>> Yes. it's intentional. SQL arrays might be 1-based by default, but >>>> JavaScript arrays are not. JsonPath and similar gadgets treat the >>>> arrays as zero-based. I suspect the Json-using community would not >>>> thank us for being overly SQL-centric on this - and I say that as >>>> someone who has always thought zero based arrays were a major design >>>> mistake, responsible for countless off-by-one errors. >>> >>> Perhaps we could compromise by making arrays 0.5-based. >> >> Well, I'm not prepared to argue with Andrew in this one. It was >> surprising behavior to me, but that's sample size one. > > I doubt I'm very representative either. People like David Wheeler, Taras > Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than > me. I'm quite prepared to change it if that's the consensus. Hello. I'm inclined to go with the same gut feeling you had (zero-based-indexing). Here is the background for my reasoning: The downside of zero-based-indexing is that people who want to use multiple sequential container types will inevitably have to deal with detailed and not easily type-checked integer coordinates that mean different things in each domain that will, no doubt, lead to a number of off-by-one errors. Nevertheless, this cost is already paid because one of the first things many people will do in programs generating SQL queries is try to zero-index a SQL array, swear a bit after figuring things out (because a NULL will be generated, not an error), and then adjust all the offsets. So, this is not a new problem. On many occasions I'm sure this has caused off-by-one bugs, or the NULLs slipped through testing and delivered funny results, yet the world moves on. On the other hand, the downside of going down the road of 1-based indexing and attempting to attain relative sameness to SQL arrays, it would also feel like one would be obliged to implement SQL array infelicities like 'out of bounds' being SQL NULL rather than an error, related to other spectres like non-rectangular nested arrays. SQL array semantics are complex and The Committee can change them or -- slightly more likely -- add interactions, so it seems like a general expectation that Postgres container types that happen to have any reasonable ordinal addressing will implement some level of same-ness with SQL arrays is a very messy one. As such, if it becomes customary to implement one-based indexing of containers, I think such customs are best carefully circumscribed so that attempts to be 'like' SQL arrays are only as superficial as that. What made me come down on the side of zero-based indexing in spite of the weaknesses are these two reasons: * The number of people who use JSON and zero-based-indexing is very large, and furthermore, within that set the number thatknow that SQL even defines array support -- much less that Postgres implements it -- is much smaller. Thus, one is targetingcohesion with a fairly alien concept that is not understood by the audience. * Maintaining PL integrated code that uses both 1-based indexing in PG functions and 0-based indexing in embedded languagesthat are likely to be combined with JSON -- doesn't sound very palatable, and the use of such PLs (e.g. plv8) seemspretty likely, too. That can probably be a rich source of bugs and frustration. If one wants SQL array semantics, it seems like the right way to get them is coercion to a SQL array value. Then one will receive SQL array semantics exactly. -- fdr
<div class="moz-cite-prefix">On 16/01/13 08:04, David Fetter wrote:<br /></div><blockquote cite="mid:20130115190424.GB32407@fetter.org"type="cite"><pre wrap="">On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstanwrote: </pre><blockquote type="cite"><pre wrap=""> On 01/14/2013 07:36 PM, Merlin Moncure wrote: </pre><blockquote type="cite"><pre wrap="">While testing this I noticed that integer based 'get' routines are zero based -- was this intentional? Virtually all other aspects of SQL are 1 based: postgres=# select json_get('[1,2,3]', 1);json_get ----------2 (1 row) postgres=# select json_get('[1,2,3]', 0);json_get ----------1 (1 row) </pre></blockquote><pre wrap=""> Yes. it's intentional. SQL arrays might be 1-based by default, but JavaScript arrays are not. JsonPath and similar gadgets treat the arrays as zero-based. I suspect the Json-using community would not thank us for being overly SQL-centric on this - and I say that as someone who has always thought zero based arrays were a major design mistake, responsible for countless off-by-one errors. </pre></blockquote><pre wrap=""> Perhaps we could compromise by making arrays 0.5-based. Cheers, David. </pre></blockquote><font size="-1">I think that <font size="-1">is far to rational, perhaps the reciprocal of the goldenratio<font size="-1"> (0.618033...)</font> </font></font>would be more appropriate?<br /><br /> I used to be insistentthat arrays should start with 1, now I find starting at 0 far more natural - because evrytime you start an arrayat 1, the computer has to subtract 1 in order to calculate the entry. Also both Java & C are zero based.<br /><br/> I first learnt FORTRAN IV which is 1 based, had a shock when I was learning Algol and found it was 0 based - manymoons ago...<br /><br /> <br /> Cheers,<br /> Gavin<br />
On Jan 15, 2013, at 12:17 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > I doubt I'm very representative either. People like David Wheeler, Taras Mitran, Joe Van Dyk, and the Heroku guys wouldbe better people to ask than me. I'm quite prepared to change it if that's the consensus. They’re JSON arrays, not SQL arrays, and JSON arrays are based on JavaScript, where they are 0-based. Best, David
On 01/15/2013 11:31 AM, Andrew Dunstan wrote: > > On 01/14/2013 11:02 PM, Andrew Dunstan wrote: >> >> On 01/14/2013 12:52 PM, Andrew Dunstan wrote: >>> >>> On 01/14/2013 11:32 AM, Robert Haas wrote: >>>> >>>> So, how much performance does this lose on json_in() on a large >>>> cstring, as compared with master? >>> >>> That's a good question. I'll try to devise a test. >>> >>>> >>>> I can't shake the feeling that this is adding a LOT of unnecessary >>>> data copying. For one thing, instead of copying every single lexeme >>>> (including the single-character ones?) out of the original object, we >>>> could just store a pointer to the offset where the object starts and a >>>> length, instead of copying it. >>> >>> In the pure pares case (json_in, json_reccv) nothing extra should be >>> copied. On checking this after reading the above I found that wasn't >>> quite the case, and some lexemes (scalars and field names, but not >>> punctuation) were being copied when not needed. I have made a fix >>> (see >>> <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>) >>> which I will include in the next version I publish. >>> >>> In the case of string lexemes, we are passing back a de-escaped >>> version, so just handing back pointers to the beginning and end in >>> the input string doesn't work. >> >> >> After a couple of iterations, some performance enhancements to the >> json parser and lexer have ended up with a net performance >> improvement over git tip. On our test rig, the json parse test runs >> at just over 13s per 10000 parses on git tip and approx 12.55s per >> 10000 parses with the attached patch. >> >> Truth be told, I think the lexer changes have more than paid for the >> small cost of the switch to an RD parser. But since the result is a >> net performance win PLUS some enhanced functionality, I think we >> should be all good. >> >> > > Latest version of this patch, including some documentation, mainly > from Merlin Moncure but tweaked by me. > Now with more comments. cheers andrew
Вложения
On Mon, Jan 14, 2013 at 11:02 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > After a couple of iterations, some performance enhancements to the json > parser and lexer have ended up with a net performance improvement over git > tip. On our test rig, the json parse test runs at just over 13s per 10000 > parses on git tip and approx 12.55s per 10000 parses with the attached > patch. > > Truth be told, I think the lexer changes have more than paid for the small > cost of the switch to an RD parser. But since the result is a net > performance win PLUS some enhanced functionality, I think we should be all > good. Yeah, that sounds great. Thanks for putting in the effort. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 01/15/2013 05:45 PM, Andrew Dunstan wrote: >>> >> >> Latest version of this patch, including some documentation, mainly >> from Merlin Moncure but tweaked by me. >> > > > Now with more comments. > New version attached. The only change is to remove some unnecessary uses of PG_FUNCTION_INFO_V1 that were the result of overenthusiastic copy and paste. cheers andrew
Вложения
On 1/10/13 6:42 PM, Andrew Dunstan wrote: > This updated patch contains all the intended functionality, including > operators for the json_get_path functions, so you can say things like > > select jsonval->array['f1','0','f2] ... I would like to not create any -> operators, so that that syntax could be used in the future for method invocation or something similar (it's in the SQL standard). I also don't find the proposed use to be very intuitive. You invented lots of other function names -- why not invent a few more for this purpose that are clearer?
On 01/31/2013 05:06 PM, Peter Eisentraut wrote: > On 1/10/13 6:42 PM, Andrew Dunstan wrote: >> This updated patch contains all the intended functionality, including >> operators for the json_get_path functions, so you can say things like >> >> select jsonval->array['f1','0','f2] ... > I would like to not create any -> operators, so that that syntax could > be used in the future for method invocation or something similar (it's > in the SQL standard). This is the first time I have heard that we should stay away from this. We have operators with this name in hstore, which is why I chose it. Have we officially deprecated '->'? I know we deprecated "=>", but I simply don't recall anything about '->'. > > I also don't find the proposed use to be very intuitive. You invented > lots of other function names -- why not invent a few more for this > purpose that are clearer? > > I'm happy to take opinions about this, and I expected some bikeshedding, but your reaction is contrary to everything others have told me. Mostly they love the operators. I guess that '~>' and '~>>' would work as well as '->' and '->>'. cheers andrew
On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 01/31/2013 05:06 PM, Peter Eisentraut wrote: >> >> On 1/10/13 6:42 PM, Andrew Dunstan wrote: >>> >>> This updated patch contains all the intended functionality, including >>> operators for the json_get_path functions, so you can say things like >>> >>> select jsonval->array['f1','0','f2] ... >> >> I would like to not create any -> operators, so that that syntax could >> be used in the future for method invocation or something similar (it's >> in the SQL standard). > > > > This is the first time I have heard that we should stay away from this. We > have operators with this name in hstore, which is why I chose it. > > Have we officially deprecated '->'? I know we deprecated "=>", but I simply > don't recall anything about '->'. > > >> >> I also don't find the proposed use to be very intuitive. You invented >> lots of other function names -- why not invent a few more for this >> purpose that are clearer? >> >> > > > I'm happy to take opinions about this, and I expected some bikeshedding, but > your reaction is contrary to everything others have told me. Mostly they > love the operators. > > I guess that '~>' and '~>>' would work as well as '->' and '->>'. also hstore implements -> quick off-topic aside: is colon (:) reserved for any purpose as an operator in SQL? merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 01/31/2013 05:06 PM, Peter Eisentraut wrote: >>> I would like to not create any -> operators, so that that syntax could >>> be used in the future for method invocation or something similar (it's >>> in the SQL standard). >> This is the first time I have heard that we should stay away from this. We >> have operators with this name in hstore, which is why I chose it. I'm not happy about this either. It's bad enough that we're thinking about taking away =>, but to disallow -> as well? My inclination is to just say no, we're not implementing that. Even if we remove the contrib operators named that way, it's insane to suppose that nobody has chosen these names for user-defined operators in their applications. > quick off-topic aside: is colon (:) reserved for any purpose as an > operator in SQL? We disallow it as an operator character, because of the conflict with parameter/variable syntax in ecpg and psql. It was allowed before PG 7.0, IIRC. regards, tom lane
On Jan 31, 2013, at 2:20 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > I'm happy to take opinions about this, and I expected some bikeshedding, but your reaction is contrary to everything othershave told me. Mostly they love the operators. > > I guess that '~>' and '~>>' would work as well as '->' and '->>'. Or +> and +>>, since ~ is set very high and small by some fonts (where the fontmakers though of it as a kind of superscriptcharacter). I suppose that := is out of the question? Best, David
On 01/31/2013 07:16 PM, David E. Wheeler wrote: > On Jan 31, 2013, at 2:20 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > >> I'm happy to take opinions about this, and I expected some bikeshedding, but your reaction is contrary to everything othershave told me. Mostly they love the operators. >> >> I guess that '~>' and '~>>' would work as well as '->' and '->>'. > Or +> and +>>, since ~ is set very high and small by some fonts (where the fontmakers though of it as a kind of superscriptcharacter). > > I suppose that := is out of the question? > Even if it were I would not on any account use it. As an old Ada programmer my mind just revolts at the idea of using this for anything but assignment. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 01/31/2013 07:16 PM, David E. Wheeler wrote: >> I suppose that := is out of the question? > Even if it were I would not on any account use it. As an old Ada > programmer my mind just revolts at the idea of using this for anything > but assignment. Ada or no, its use in plpgsql would render that a seriously bad idea. regards, tom lane
On Jan 31, 2013, at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ada or no, its use in plpgsql would render that a seriously bad idea. I assumed that its use in function params would be the main reason not to use it. David
<div class="moz-cite-prefix">On 01/02/13 13:26, Andrew Dunstan wrote:<br /></div><blockquote cite="mid:510B0BD2.4010000@dunslane.net"type="cite"><br /> On 01/31/2013 07:16 PM, David E. Wheeler wrote: <br /><blockquotetype="cite">On Jan 31, 2013, at 2:20 PM, Andrew Dunstan <a class="moz-txt-link-rfc2396E" href="mailto:andrew@dunslane.net"><andrew@dunslane.net></a>wrote: <br /><br /><blockquote type="cite">I'm happy totake opinions about this, and I expected some bikeshedding, but your reaction is contrary to everything others have toldme. Mostly they love the operators. <br /><br /> I guess that '~>' and '~>>' would work as well as '->' and'->>'. <br /></blockquote> Or +> and +>>, since ~ is set very high and small by some fonts (where the fontmakersthough of it as a kind of superscript character). <br /><br /> I suppose that := is out of the question? <br /><br/></blockquote><br /><br /> Even if it were I would not on any account use it. As an old Ada programmer my mind justrevolts at the idea of using this for anything but assignment. <br /><br /> cheers <br /><br /> andrew <br /><br /><br/></blockquote><font size="-1">Ancient Al<font size="-1">gol 60 programmer here, otherwise ditto!</font></font><br />
On Thu, Jan 31, 2013 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >>> On 01/31/2013 05:06 PM, Peter Eisentraut wrote: >>>> I would like to not create any -> operators, so that that syntax could >>>> be used in the future for method invocation or something similar (it's >>>> in the SQL standard). > >>> This is the first time I have heard that we should stay away from this. We >>> have operators with this name in hstore, which is why I chose it. > > I'm not happy about this either. It's bad enough that we're thinking > about taking away =>, but to disallow -> as well? My inclination is to > just say no, we're not implementing that. Even if we remove the contrib > operators named that way, it's insane to suppose that nobody has chosen > these names for user-defined operators in their applications. I think it's smarter for us to ship functions, and let users wrap them in operators if they so choose. It's not difficult for people who want it to do it, and that way we dodge this whole mess. The thing that was really awful about hstore's => operator is that it was =>(text, text), meaning that if somebody else invented, say, xstore, they could not do the same thing that hstore did without colliding with hstore. I think we ought to have an ironclad rule that any operators introduced in our core distribution for particular types must have at least one argument of that type. Otherwise, we'll end up with a free-for-all where everyone tries to grab the "good" operator names (of which there are only a handful) for their type, and types we're adding five years from now will be stuck with no operator names at all, or dumb stuff like ~~~~>. But even leaving that aside, I'm surprised to hear so many people dismissing SQL standards compliance so blithely. We've certainly spent a lot of blood, sweat, and tears on minor standards-compliance issues over they years - why is it OK to not care about this particular issue when we've spent so much effort caring about other ones? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jan 31, 2013 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Merlin Moncure <mmoncure@gmail.com> writes: >>> On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >>>> On 01/31/2013 05:06 PM, Peter Eisentraut wrote: >>>>> I would like to not create any -> operators, so that that syntax could >>>>> be used in the future for method invocation or something similar (it's >>>>> in the SQL standard). >> >>>> This is the first time I have heard that we should stay away from this. We >>>> have operators with this name in hstore, which is why I chose it. >> >> I'm not happy about this either. It's bad enough that we're thinking >> about taking away =>, but to disallow -> as well? My inclination is to >> just say no, we're not implementing that. Even if we remove the contrib >> operators named that way, it's insane to suppose that nobody has chosen >> these names for user-defined operators in their applications. > > I think it's smarter for us to ship functions, and let users wrap them > in operators if they so choose. It's not difficult for people who > want it to do it, and that way we dodge this whole mess. Normally I'd agree with you, but I think there's a complexity here that is very frown-inducing, although I'm not positively inclined to state that it's so great that your suggested solution is not the least of all evils: http://www.postgresql.org/message-id/8551.1331580169@sss.pgh.pa.us The problem being: even though pg_operator resolves to functions in pg_proc, they have distinct identities as far as the planner is concerned w.r.t selectivity estimation and index selection. Already there is a slight hazard that some ORMs that want to grow hstore support will spell it "fetchval" and others "->". So far, infix syntax seems to be the common default, but a minor disagreement among ORMs or different user preferences will be destructive. Another way to look at this is that by depriving people multiple choices in the default install, that hazard goes away. But it also means that, practically, CREATE OPERATOR is going to be hazardous to use because almost all software is probably not going to assume the existence of any non-default installed operators for JSON, and those that do will not receive the benefit of indexes from software using the other notation. So, I think that if one takes the 'when in doubt, leave it out' approach you seem to be proposing, I also think that very little profitable use of CREATE OPERATOR will take place -- ORMs et al will choose the lowest common denominator (for good sensible reason) and flirting with CREATE OPERATOR will probably cause surprising plans, so I doubt it'll take hold. -- fdr
Daniel Farina <daniel@heroku.com> writes: > On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I think it's smarter for us to ship functions, and let users wrap them >> in operators if they so choose. It's not difficult for people who > The problem being: even though pg_operator resolves to functions in > pg_proc, they have distinct identities as far as the planner is > concerned w.r.t selectivity estimation and index selection. Yeah, this is surely not a workable policy unless we first move all those planner smarts to apply to functions not operators. And rewrite all the index AM APIs to use functions not operators, too. Now this is something that's been a wish-list item right along, but actually doing it has always looked like a great deal of work for rather small reward. regards, tom lane
On Fri, Feb 1, 2013 at 6:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Farina <daniel@heroku.com> writes: >> On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> I think it's smarter for us to ship functions, and let users wrap them >>> in operators if they so choose. It's not difficult for people who > >> The problem being: even though pg_operator resolves to functions in >> pg_proc, they have distinct identities as far as the planner is >> concerned w.r.t selectivity estimation and index selection. > > Yeah, this is surely not a workable policy unless we first move all > those planner smarts to apply to functions not operators. And rewrite > all the index AM APIs to use functions not operators, too. Now this is > something that's been a wish-list item right along, but actually doing > it has always looked like a great deal of work for rather small reward. Hmm. Well, if the operators are going to be indexable, then I agree that's an issue, but isn't -> just a key-extraction operator? That wouldn't be something you could index anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/03/2013 08:20 PM, Robert Haas wrote: > On Fri, Feb 1, 2013 at 6:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Daniel Farina <daniel@heroku.com> writes: >>> On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> I think it's smarter for us to ship functions, and let users wrap them >>>> in operators if they so choose. It's not difficult for people who >>> The problem being: even though pg_operator resolves to functions in >>> pg_proc, they have distinct identities as far as the planner is >>> concerned w.r.t selectivity estimation and index selection. >> Yeah, this is surely not a workable policy unless we first move all >> those planner smarts to apply to functions not operators. And rewrite >> all the index AM APIs to use functions not operators, too. Now this is >> something that's been a wish-list item right along, but actually doing >> it has always looked like a great deal of work for rather small reward. > Hmm. Well, if the operators are going to be indexable, then I agree > that's an issue, but isn't -> just a key-extraction operator? That > wouldn't be something you could index anyway. > Er, what? It gives you the value corresponding to a key (or the numbered array element). With the Json operators I provided you're more likely to use ->> in an index, because it returns de-escaped text rather than json, but I don't see any reason in principle why -> couldn't be used. cheers andrew
On Fri, Feb 1, 2013 at 4:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: > But even leaving that aside, I'm surprised to hear so many people > dismissing SQL standards compliance so blithely. We've certainly > spent a lot of blood, sweat, and tears on minor standards-compliance > issues over they years - why is it OK to not care about this > particular issue when we've spent so much effort caring about other > ones? Does the SQL Standard suggest you can't extend the language with operators? Or does it reserve certain characters for future use? And if so, is there a list? merlin
On Sun, Feb 3, 2013 at 9:05 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >>> Yeah, this is surely not a workable policy unless we first move all >>> those planner smarts to apply to functions not operators. And rewrite >>> all the index AM APIs to use functions not operators, too. Now this is >>> something that's been a wish-list item right along, but actually doing >>> it has always looked like a great deal of work for rather small reward. >> >> Hmm. Well, if the operators are going to be indexable, then I agree >> that's an issue, but isn't -> just a key-extraction operator? That >> wouldn't be something you could index anyway. > > Er, what? It gives you the value corresponding to a key (or the numbered > array element). That's what I figured. > With the Json operators I provided you're more likely to use ->> in an > index, because it returns de-escaped text rather than json, but I don't see > any reason in principle why -> couldn't be used. The point is that if you're talking about indexing something like myeav->'andrew' you could equally well index json_get(myeav, 'andrew'). So there's no real need for it to be an operator rather than a function. The case in which it would matter is if it were something that could be used as an index predicate, like: Index Scan -> Index Cond: myeav->'andrew' As of now, the query planner won't consider such a plan if it's only a function and not an operator. So if we had a case like that, the use of operator notation could be justified on performance grounds. If we don't, I argue that it's better to stick with functional notation, because the number of sensible function names is much larger than the number of sensible operator names, and if we start using operator notation every time someone thinks it will look nicer that way, we will very quickly either run out of nice-looking operator names or start overloading them heavily. The SQL standards considerations seem worth thinking about, too. We've certainly gone through a lot of pain working toward eliminating => as an operator name, and if the SQL standard has commandeered -> for some purpose or other, I'd really rather not add to the headaches involved should we ever decide to reclaim it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/04/2013 10:47 AM, Robert Haas wrote: > > The SQL standards considerations seem worth thinking about, too. > We've certainly gone through a lot of pain working toward eliminating > => as an operator name, and if the SQL standard has commandeered -> > for some purpose or other, I'd really rather not add to the headaches > involved should we ever decide to reclaim it. OK, but I'd like to know what is going to be safe. There's no way to future-proof the language. I'm quite prepared to replace -> with something else, and if I do then ->> will need to be adjusted accordingly, I think. My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on the ground that some fonts elevate ~ rather than aligning it in the middle as most monospaced fonts do, but I'm tempted just to say "then use a different font." Other possibilities that come to mind are +> and +>>, although I think they're less attractive. But I'll be guided by the consensus, assuming there is one ;-) cheers andrew
On Mon, Feb 4, 2013 at 4:10 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
OK, but I'd like to know what is going to be safe. There's no way to future-proof the language. I'm quite prepared to replace -> with something else, and if I do then ->> will need to be adjusted accordingly, I think.
On 02/04/2013 10:47 AM, Robert Haas wrote:
The SQL standards considerations seem worth thinking about, too.
We've certainly gone through a lot of pain working toward eliminating
=> as an operator name, and if the SQL standard has commandeered ->
for some purpose or other, I'd really rather not add to the headaches
involved should we ever decide to reclaim it.
My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on the ground that some fonts elevate ~ rather than aligning it in the middle as most monospaced fonts do, but I'm tempted just to say "then use a different font." Other possibilities that come to mind are +> and +>>, although I think they're less attractive. But I'll be guided by the consensus, assuming there is one ;-)
As a user I would be much in favor of just functions and no additional operators if the sole difference is syntactical. I think custom operators are much harder to remember than function names (assuming reasonably well chosen function names).
Now Robert seems to suggest that there will also be speed / planner difference which seems sad (I would have expected operators to be just syntactical sugar for specially named functions and once we are past the parser there should be no difference).
On Feb 4, 2013, at 8:10 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on the ground that some fonts elevate ~ ratherthan aligning it in the middle as most monospaced fonts do, but I'm tempted just to say "then use a different font."Other possibilities that come to mind are +> and +>>, although I think they're less attractive. But I'll be guidedby the consensus, assuming there is one ;-) On the contrary, I quite like ~>. I've used it in pair. http://pgxn.org/extension/pair But others have complained about the font issue when I've suggested it for things in the past. My fonts don't suck. :-) I can live with +> and +>>. David
On Mon, Feb 4, 2013 at 10:18 AM, Benedikt Grundmann <bgrundmann@janestreet.com> wrote: > > > > On Mon, Feb 4, 2013 at 4:10 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> >> >> On 02/04/2013 10:47 AM, Robert Haas wrote: >>> >>> >>> The SQL standards considerations seem worth thinking about, too. >>> We've certainly gone through a lot of pain working toward eliminating >>> => as an operator name, and if the SQL standard has commandeered -> >>> for some purpose or other, I'd really rather not add to the headaches >>> involved should we ever decide to reclaim it. >> >> >> >> OK, but I'd like to know what is going to be safe. There's no way to >> future-proof the language. I'm quite prepared to replace -> with something >> else, and if I do then ->> will need to be adjusted accordingly, I think. >> >> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that >> on the ground that some fonts elevate ~ rather than aligning it in the >> middle as most monospaced fonts do, but I'm tempted just to say "then use a >> different font." Other possibilities that come to mind are +> and +>>, >> although I think they're less attractive. But I'll be guided by the >> consensus, assuming there is one ;-) >> > As a user I would be much in favor of just functions and no additional > operators if the sole difference is syntactical. I think custom operators > are much harder to remember than function names (assuming reasonably well > chosen function names). couple quick observations: *) just about all postgres extension types expose operators -- problem is not specific to json (and therefore IMO irrelevant to 9.3 release of enhancements) *) hstore exposes ->. I use it all over the place. I find operator to be terse and readable -- much more so than function definition. Ok, operator such as "@-@" is pretty silly, but "->" for get is natural. The cat is out of the bag, so removing -> for 9.3 for production seems pretty fruitless. *) Removing -> (breaking all my and countless others' hstore dependent code) should not happen until there is a very good reason. This was extensively discussed in development of hstore. Breaking compatibility sucks -- my company is just wrapping up a 12 month code overhaul so we could get off 8.1. *) it's bad enough that we drift from sql naming conventions and all type manipulation functions (except in hstore) with type name. json_get etc. at least using operators allow avoiding some of that unnecessary verbosity. what's next: text_concatenate('foo', 'bar')? merlin
On 02/04/2013 12:57 PM, Merlin Moncure wrote: > *) it's bad enough that we drift from sql naming conventions and all > type manipulation functions (except in hstore) with type name. > json_get etc. at least using operators allow avoiding some of that > unnecessary verbosity. what's next: text_concatenate('foo', 'bar')? > This names aren't set in stone either. I've been expecting some bikeshedding there, and I'm surprised there hasn't been more. cheers andrew
On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > On 02/04/2013 10:47 AM, Robert Haas wrote: >> >> >> The SQL standards considerations seem worth thinking about, too. >> We've certainly gone through a lot of pain working toward eliminating >> => as an operator name, and if the SQL standard has commandeered -> >> for some purpose or other, I'd really rather not add to the headaches >> involved should we ever decide to reclaim it. > > OK, but I'd like to know what is going to be safe. There's no way to > future-proof the language. I'm quite prepared to replace -> with something > else, and if I do then ->> will need to be adjusted accordingly, I think. > > My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on > the ground that some fonts elevate ~ rather than aligning it in the middle > as most monospaced fonts do, but I'm tempted just to say "then use a > different font." Other possibilities that come to mind are +> and +>>, > although I think they're less attractive. But I'll be guided by the > consensus, assuming there is one ;-) I suspect both of those are pretty safe from an SQL standards point of view. Of course, as Tom is often wont to point out, the SQL standards committee sometimes does bizarre things, so nothing's perfect, but I'd be rather shocked if any of those got tapped to mean something else. That having been said, I still don't see value in adding operators at all. Good old function call notation seems perfectly adequate from where I sit. Sure, it's a little more verbose, but when you try to too hard make things concise then you end up having to explain to your users why \ditS is a sensible thing for them to type into psql, or why s@\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose this argument, but I've worked with a couple of languages where operators can be overloaded (C++) or defined (ML) and it's just never seemed to work out very well. YMMV, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Feb 4, 2013 at 12:07 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 02/04/2013 12:57 PM, Merlin Moncure wrote: > >> *) it's bad enough that we drift from sql naming conventions and all >> type manipulation functions (except in hstore) with type name. >> json_get etc. at least using operators allow avoiding some of that >> unnecessary verbosity. what's next: text_concatenate('foo', 'bar')? >> > > This names aren't set in stone either. I've been expecting some bikeshedding > there, and I'm surprised there hasn't been more. Well -- heh (asked to bikeshed: joy!) -- I felt like my objections were noted and am more interested in getting said functionality out the door than splitting hairs over names. Type prefix issue is under the same umbrella as use of the -> operator, that is, *not specifically related to this patch, and not worth holding up this patch over*. In both cases it's essentially crying over spilt milk. My only remaining nit with the proposal is with json_unnest(). SQL unnest() produces list of scalars regardless of dimensionality -- json unnest unwraps one level only (contrast: pl/pgsql array 'slice').So I think 'json_array_each', or perhaps json_slice()is a better fit there. merlin
On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I suspect both of those are pretty safe from an SQL standards point of
view. Of course, as Tom is often wont to point out, the SQL standards
committee sometimes does bizarre things, so nothing's perfect, but I'd
be rather shocked if any of those got tapped to mean something else.
That having been said, I still don't see value in adding operators at
all. Good old function call notation seems perfectly adequate from
where I sit. Sure, it's a little more verbose, but when you try to
too hard make things concise then you end up having to explain to your
users why \ditS is a sensible thing for them to type into psql, or why
s@\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose
this argument, but I've worked with a couple of languages where
operators can be overloaded (C++) or defined (ML) and it's just never
seemed to work out very well. YMMV, of course.
For what my opinion is worth I absolute agree with just having function names. The -> in hstore is kind of nice, but it lead me to a whole lot of greif when I couldn't figure out how to create an index using it (turns out you have to use _double_ parens, who knew?), but could create an index on fetchval and assumed that postgres would figure it out.
Also a for quite a while it felt just like incantation of when I'd need parens around those operatiors or not. Now that I sorta-kinda-not-really understand the operation precedence rules in postgres/sql standard, I've mostly given up on using cute operators because their much more of a pain on a day-to-day basis.
On 02/04/2013 02:59 PM, Merlin Moncure wrote: > On Mon, Feb 4, 2013 at 12:07 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 02/04/2013 12:57 PM, Merlin Moncure wrote: >> >>> *) it's bad enough that we drift from sql naming conventions and all >>> type manipulation functions (except in hstore) with type name. >>> json_get etc. at least using operators allow avoiding some of that >>> unnecessary verbosity. what's next: text_concatenate('foo', 'bar')? >>> >> This names aren't set in stone either. I've been expecting some bikeshedding >> there, and I'm surprised there hasn't been more. > Well -- heh (asked to bikeshed: joy!) -- I felt like my objections > were noted and am more interested in getting said functionality out > the door than splitting hairs over names. Type prefix issue is under > the same umbrella as use of the -> operator, that is, *not > specifically related to this patch, and not worth holding up this > patch over*. In both cases it's essentially crying over spilt milk. > > My only remaining nit with the proposal is with json_unnest(). > > SQL unnest() produces list of scalars regardless of dimensionality -- > json unnest unwraps one level only (contrast: pl/pgsql array 'slice'). > So I think 'json_array_each', or perhaps json_slice() is a better fit > there. > how about json_array_elements()? cheers andrew
On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 02/04/2013 10:47 AM, Robert Haas wrote: >>> >>> >>> The SQL standards considerations seem worth thinking about, too. >>> We've certainly gone through a lot of pain working toward eliminating >>> => as an operator name, and if the SQL standard has commandeered -> >>> for some purpose or other, I'd really rather not add to the headaches >>> involved should we ever decide to reclaim it. >> >> OK, but I'd like to know what is going to be safe. There's no way to >> future-proof the language. I'm quite prepared to replace -> with something >> else, and if I do then ->> will need to be adjusted accordingly, I think. >> >> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on >> the ground that some fonts elevate ~ rather than aligning it in the middle >> as most monospaced fonts do, but I'm tempted just to say "then use a >> different font." Other possibilities that come to mind are +> and +>>, >> although I think they're less attractive. But I'll be guided by the >> consensus, assuming there is one ;-) > > I suspect both of those are pretty safe from an SQL standards point of > view. Of course, as Tom is often wont to point out, the SQL standards > committee sometimes does bizarre things, so nothing's perfect, but I'd > be rather shocked if any of those got tapped to mean something else. > > That having been said, I still don't see value in adding operators at > all. Good old function call notation seems perfectly adequate from > where I sit. Sure, it's a little more verbose, but when you try to > too hard make things concise then you end up having to explain to your > users why \ditS is a sensible thing for them to type into psql, or why > s@\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose > this argument, but I've worked with a couple of languages where > operators can be overloaded (C++) or defined (ML) and it's just never > seemed to work out very well. YMMV, of course. I also basically feel this way, although I know I tend more towards notational brutalism than many. I think we shouldn't kid ourselves that non-default operators will be used, and for current-implementation reasons (that maybe could be fixed by someone determined) it's not really at the pleasure of the author to use them via CREATE OPERATOR either. So, I basically subscribe to view that we should investigate what total reliance on prefix syntax looks like. I guess it'll make nested navigation horribly ugly, though...positively lisp-esque. That' s one consideration hstore doesn't have that may make use of infix notations considerably more useful for json than hstore. -- fdr
On 02/04/2013 03:16 PM, Daniel Farina wrote: > On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew@dunslane.net> wrote: >>> On 02/04/2013 10:47 AM, Robert Haas wrote: >>>> >>>> The SQL standards considerations seem worth thinking about, too. >>>> We've certainly gone through a lot of pain working toward eliminating >>>> => as an operator name, and if the SQL standard has commandeered -> >>>> for some purpose or other, I'd really rather not add to the headaches >>>> involved should we ever decide to reclaim it. >>> OK, but I'd like to know what is going to be safe. There's no way to >>> future-proof the language. I'm quite prepared to replace -> with something >>> else, and if I do then ->> will need to be adjusted accordingly, I think. >>> >>> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on >>> the ground that some fonts elevate ~ rather than aligning it in the middle >>> as most monospaced fonts do, but I'm tempted just to say "then use a >>> different font." Other possibilities that come to mind are +> and +>>, >>> although I think they're less attractive. But I'll be guided by the >>> consensus, assuming there is one ;-) >> I suspect both of those are pretty safe from an SQL standards point of >> view. Of course, as Tom is often wont to point out, the SQL standards >> committee sometimes does bizarre things, so nothing's perfect, but I'd >> be rather shocked if any of those got tapped to mean something else. >> >> That having been said, I still don't see value in adding operators at >> all. Good old function call notation seems perfectly adequate from >> where I sit. Sure, it's a little more verbose, but when you try to >> too hard make things concise then you end up having to explain to your >> users why \ditS is a sensible thing for them to type into psql, or why >> s@\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose >> this argument, but I've worked with a couple of languages where >> operators can be overloaded (C++) or defined (ML) and it's just never >> seemed to work out very well. YMMV, of course. > I also basically feel this way, although I know I tend more towards > notational brutalism than many. I think we shouldn't kid ourselves > that non-default operators will be used, and for > current-implementation reasons (that maybe could be fixed by someone > determined) it's not really at the pleasure of the author to use them > via CREATE OPERATOR either. > > So, I basically subscribe to view that we should investigate what > total reliance on prefix syntax looks like. I guess it'll make nested > navigation horribly ugly, though...positively lisp-esque. That' s one > consideration hstore doesn't have that may make use of infix notations > considerably more useful for json than hstore. > We don't have the luxury of designing things like this in or out from scratch. Creation of operators has been a part of PostgreSQL for a good while longer than my involvement, and a great many people expect to be able to use it. I can just imagine the outrage at any suggestion of removing it. So, please, let's get real. A "total reliance on prefix syntax" isn't going to happen, and investigating what it would look like seems to me just so much wasted time and effort. cheers andrew
On Mon, Feb 4, 2013 at 2:10 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > >> My only remaining nit with the proposal is with json_unnest(). >> >> SQL unnest() produces list of scalars regardless of dimensionality -- >> json unnest unwraps one level only (contrast: pl/pgsql array 'slice'). >> So I think 'json_array_each', or perhaps json_slice() is a better fit >> there. >> > > > how about json_array_elements()? that works (although it's a little verbose for my taste). maybe json_unwrap, json_array_unwrap, etc. merlin
On Mon, Feb 4, 2013 at 12:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 02/04/2013 03:16 PM, Daniel Farina wrote: >> >> On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas@gmail.com> >> wrote: >>> >>> On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew@dunslane.net> >>> wrote: >>>> >>>> On 02/04/2013 10:47 AM, Robert Haas wrote: >>>>> >>>>> >>>>> The SQL standards considerations seem worth thinking about, too. >>>>> We've certainly gone through a lot of pain working toward eliminating >>>>> => as an operator name, and if the SQL standard has commandeered -> >>>>> for some purpose or other, I'd really rather not add to the headaches >>>>> involved should we ever decide to reclaim it. >>>> >>>> OK, but I'd like to know what is going to be safe. There's no way to >>>> future-proof the language. I'm quite prepared to replace -> with >>>> something >>>> else, and if I do then ->> will need to be adjusted accordingly, I >>>> think. >>>> >>>> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that >>>> on >>>> the ground that some fonts elevate ~ rather than aligning it in the >>>> middle >>>> as most monospaced fonts do, but I'm tempted just to say "then use a >>>> different font." Other possibilities that come to mind are +> and +>>, >>>> although I think they're less attractive. But I'll be guided by the >>>> consensus, assuming there is one ;-) >>> >>> I suspect both of those are pretty safe from an SQL standards point of >>> view. Of course, as Tom is often wont to point out, the SQL standards >>> committee sometimes does bizarre things, so nothing's perfect, but I'd >>> be rather shocked if any of those got tapped to mean something else. >>> >>> That having been said, I still don't see value in adding operators at >>> all. Good old function call notation seems perfectly adequate from >>> where I sit. Sure, it's a little more verbose, but when you try to >>> too hard make things concise then you end up having to explain to your >>> users why \ditS is a sensible thing for them to type into psql, or why >>> s@\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose >>> this argument, but I've worked with a couple of languages where >>> operators can be overloaded (C++) or defined (ML) and it's just never >>> seemed to work out very well. YMMV, of course. >> >> I also basically feel this way, although I know I tend more towards >> notational brutalism than many. I think we shouldn't kid ourselves >> that non-default operators will be used, and for >> current-implementation reasons (that maybe could be fixed by someone >> determined) it's not really at the pleasure of the author to use them >> via CREATE OPERATOR either. >> >> So, I basically subscribe to view that we should investigate what >> total reliance on prefix syntax looks like. I guess it'll make nested >> navigation horribly ugly, though...positively lisp-esque. That' s one >> consideration hstore doesn't have that may make use of infix notations >> considerably more useful for json than hstore. >> > > > We don't have the luxury of designing things like this in or out from > scratch. Creation of operators has been a part of PostgreSQL for a good > while longer than my involvement, and a great many people expect to be able > to use it. I can just imagine the outrage at any suggestion of removing it. I am only referring to referring the restriction that the planner can't understand that fetchval() and '->' mean the same thing for, say, hstore. Hence, use of non-default CREATE OPERATOR may become more useful some day, instead of basically being a pitfall when someone reasonably thinks they could use either spelling of the same functionality and the optimizer will figure it out. I'm not suggesting removal of any feature. My reference to "total reliance of prefix syntax" refers only to the JSON operators, since the previous correspondence from Robert was about how function call syntax alone may be sufficient. This phrase refers to the same idea he is proposing. I also included a weakness to that idea, which is that nesting in JSON makes the situation worse than the common compared case, hstore. -- fdr
On 02/04/2013 04:19 PM, Daniel Farina wrote: > On Mon, Feb 4, 2013 at 12:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 02/04/2013 03:16 PM, Daniel Farina wrote: >>> On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas@gmail.com> >>> wrote: >>>> On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew@dunslane.net> >>>> wrote: >>>>> On 02/04/2013 10:47 AM, Robert Haas wrote: >>>>>> >>>>>> The SQL standards considerations seem worth thinking about, too. >>>>>> We've certainly gone through a lot of pain working toward eliminating >>>>>> => as an operator name, and if the SQL standard has commandeered -> >>>>>> for some purpose or other, I'd really rather not add to the headaches >>>>>> involved should we ever decide to reclaim it. >>>>> OK, but I'd like to know what is going to be safe. There's no way to >>>>> future-proof the language. I'm quite prepared to replace -> with >>>>> something >>>>> else, and if I do then ->> will need to be adjusted accordingly, I >>>>> think. >>>>> >>>>> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that >>>>> on >>>>> the ground that some fonts elevate ~ rather than aligning it in the >>>>> middle >>>>> as most monospaced fonts do, but I'm tempted just to say "then use a >>>>> different font." Other possibilities that come to mind are +> and +>>, >>>>> although I think they're less attractive. But I'll be guided by the >>>>> consensus, assuming there is one ;-) >>>> I suspect both of those are pretty safe from an SQL standards point of >>>> view. Of course, as Tom is often wont to point out, the SQL standards >>>> committee sometimes does bizarre things, so nothing's perfect, but I'd >>>> be rather shocked if any of those got tapped to mean something else. >>>> >>>> That having been said, I still don't see value in adding operators at >>>> all. Good old function call notation seems perfectly adequate from >>>> where I sit. Sure, it's a little more verbose, but when you try to >>>> too hard make things concise then you end up having to explain to your >>>> users why \ditS is a sensible thing for them to type into psql, or why >>>> s@\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose >>>> this argument, but I've worked with a couple of languages where >>>> operators can be overloaded (C++) or defined (ML) and it's just never >>>> seemed to work out very well. YMMV, of course. >>> I also basically feel this way, although I know I tend more towards >>> notational brutalism than many. I think we shouldn't kid ourselves >>> that non-default operators will be used, and for >>> current-implementation reasons (that maybe could be fixed by someone >>> determined) it's not really at the pleasure of the author to use them >>> via CREATE OPERATOR either. >>> >>> So, I basically subscribe to view that we should investigate what >>> total reliance on prefix syntax looks like. I guess it'll make nested >>> navigation horribly ugly, though...positively lisp-esque. That' s one >>> consideration hstore doesn't have that may make use of infix notations >>> considerably more useful for json than hstore. >>> >> >> We don't have the luxury of designing things like this in or out from >> scratch. Creation of operators has been a part of PostgreSQL for a good >> while longer than my involvement, and a great many people expect to be able >> to use it. I can just imagine the outrage at any suggestion of removing it. > I am only referring to referring the restriction that the planner > can't understand that fetchval() and '->' mean the same thing for, > say, hstore. Hence, use of non-default CREATE OPERATOR may become > more useful some day, instead of basically being a pitfall when > someone reasonably thinks they could use either spelling of the same > functionality and the optimizer will figure it out. > > I'm not suggesting removal of any feature. > > My reference to "total reliance of prefix syntax" refers only to the > JSON operators, since the previous correspondence from Robert was > about how function call syntax alone may be sufficient. This phrase > refers to the same idea he is proposing. > > I also included a weakness to that idea, which is that nesting in JSON > makes the situation worse than the common compared case, hstore. I see. OK, sorry for misunderstanding. I suspect, BTW that mostly people will use get_path*() (or rather, its equivalent operator ;-) ) rather than operator chaining: select myjson->>'{"authors",0,"name"}'::text[]; cheers andrew
On 01/31/2013 11:20 PM, Andrew Dunstan wrote: > > I'm happy to take opinions about this, and I expected > some bikeshedding, but your reaction is contrary to > everything others have told me. Mostly they love the operators. What I would really like is if we extended postgresql core and made a few more constructs definable as overloadable operator: 1) array / dictionary element lookup a[b] CREATE OPERATOR [] (...) 2) attribute lookup a.b CREATE OPERATOR . (...) then you could make json lookups either step-by-step using CREATE OPERATOR [] ( PROCEDURE = json_array_lookup, LEFTARG = json, RIGHTARG = int) and CREATE OPERATOR [] ( PROCEDURE = json_dict_lookup, LEFTARG = json, RIGHTARG = text) fourthname = myjson[4]['name'] or perhaps a single CREATE OPERATOR [] ( PROCEDURE = json_deep_lookup, LEFTARG = json, RIGHTARG = VARIADIC "any") fourthname = myjson[4, 'name'] though I suspect that we do not support type VARIADIC "any" in operator definitions --------- Hannu > I guess that '~>' and '~>>' would work as well as '->' and '->>'. > > > cheers > > andrew > >
2013/2/5 Hannu Krosing <hannu@krosing.net>: > On 01/31/2013 11:20 PM, Andrew Dunstan wrote: >> >> >> I'm happy to take opinions about this, and I expected >> some bikeshedding, but your reaction is contrary to >> everything others have told me. Mostly they love the operators. > > What I would really like is if we extended postgresql core and made > a few more constructs definable as overloadable operator: > > 1) array / dictionary element lookup > a[b] > CREATE OPERATOR [] (...) > > 2) attribute lookup > a.b > CREATE OPERATOR . (...) > > then you could make json lookups either step-by-step using > > CREATE OPERATOR [] ( > PROCEDURE = json_array_lookup, LEFTARG = json, RIGHTARG = int) > > and > > CREATE OPERATOR [] ( > PROCEDURE = json_dict_lookup, LEFTARG = json, RIGHTARG = text) > > fourthname = myjson[4]['name'] > > > or perhaps a single > > > CREATE OPERATOR [] ( > PROCEDURE = json_deep_lookup, LEFTARG = json, RIGHTARG = VARIADIC "any") > > fourthname = myjson[4, 'name'] > it is near to full collection implementation - and can be nice to have it. For this moment we should to return to this topic. My preference is using well named functions (prefer it against operator) and operator that are not in collision with current ANSI SQL I don't see any nice on design select myjson->>'{"authors",0,"name"}'::text[]; - more it is ugly as dinosaurs better and more usual myjson['authors']['0']['name'] or myjson['authors/0/name'] Regards Pavel > > though I suspect that we do not support type VARIADIC "any" in operator > definitions > > --------- > Hannu > > > >> I guess that '~>' and '~>>' would work as well as '->' and '->>'. >> >> >> cheers >> >> andrew >> >> > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On 02/05/2013 02:09 AM, Pavel Stehule wrote: > > I don't see any nice on design select > myjson->>'{"authors",0,"name"}'::text[]; - more it is ugly as > dinosaurs I rather like dinosaurs. Beauty is, as they say, in the eye of the beholder. Let me also point out that you can say (somewhat less efficiently): myjson->'authors'->0->>'name' which is not terribly inelegant. > > better and more usual > > myjson['authors']['0']['name'] > > or > > myjson['authors/0/name'] Well, nothing like that is going to happen in this release. If you or someone wants to work on a general subscripting facility for arbitrary data types then I look forward to seeing it. Let me also point out that the most important part of this patch is the part that almost nobody has commented on, namely the parser changes and API that the actual visible functions are built on. Writing JSON accessor / transformation functions without this framework is hard, and often redundant. I'm much more concerned to get this framework and some basic accessor functions (and preferably operators) added than bothered about how the latter are precisely spelled. cheers andrew
2013/2/5 Andrew Dunstan <andrew@dunslane.net>: > > On 02/05/2013 02:09 AM, Pavel Stehule wrote: > > >> >> I don't see any nice on design select >> myjson->>'{"authors",0,"name"}'::text[]; - more it is ugly as >> dinosaurs > > > I rather like dinosaurs. Beauty is, as they say, in the eye of the beholder. > > Let me also point out that you can say (somewhat less efficiently): > > myjson->'authors'->0->>'name' > > which is not terribly inelegant. > > >> >> better and more usual >> >> myjson['authors']['0']['name'] >> >> or >> >> myjson['authors/0/name'] > > > > Well, nothing like that is going to happen in this release. If you or > someone wants to work on a general subscripting facility for arbitrary data > types then I look forward to seeing it. > > Let me also point out that the most important part of this patch is the part > that almost nobody has commented on, namely the parser changes and API that > the actual visible functions are built on. Writing JSON accessor / > transformation functions without this framework is hard, and often > redundant. I'm much more concerned to get this framework and some basic > accessor functions (and preferably operators) added than bothered about how > the latter are precisely spelled. C API and implementation can be changed or fixed without hard issues - it is usual so about SQL interface is hard discussion. Regards Pavel > > > cheers > > andrew > > >