Обсуждение: PATCH: Implement value_to_json for single-datum conversion
Hi all Whenever I try to work with the new json types I trip over the lack of a function to escape text to json. The attached patch against master provides one by exposing the existing datum_to_json function to SQL. I've used the name value_to_json, but I'm not sure it's necessarily the right choice of name. Please consider this for the 9.2 branch as well as to HEAD, as IMO it's very important for basic usability of the json functionality. It applies to 9.2 fine and passes "make check". I know it's late in the game, but it's also a very small change and it's very hard to build up JSON data structures other than simple rows or arrays without at minimum a way of escaping `text' to json strings. This feels basic enough that I'm wondering if there's a reason it wasn't included from the start, but I don't see any comments in json.c talking about anything like this, nor did I find any -hackers discussion about it. I suspect it's just an oversight. As value_to_json directly wraps datum_to_json it actually accepts record and array types too. I didn't see any reason to prevent that and force the user to instead use row_to_json or array_to_json for those cases. If you don't want to accept this, I can provide a wrapper for escape_json that only accepts a text argument instead, but I think *some* way to escape text to JSON is vital to have in 9.2. A docs patch will follow shortly if you're happy that this patch is reasonable. -- Craig Ringer
Whoops. It actually looks like the posted patch muffed up opr_sanity checks. I'm totally new to pg_proc.h wrangling so I'm not sure why yet, looking. Sorry, not sure how I missed that. I'll follow up shortly. -- Craig Ringer
OK, opr_sanity was failing because I added the value_to_json(text) alias to ensure that: value_to_json('some_literal') worked, following the same approach as quote_literal(anyelement) and quote_literal(text). That should be reasonable, right? The comments on the affected check in opr_sanity say that it's not necessarily wrong so long as the called function is prepared to handle the different arguments its self - which it is, since it's already accepting anyelement. The test comment reads: Note that the expected output of this part of the test will need to be modified whenever new pairs of types are made binary-equivalent, or when new polymorphic built-in functions are added so that seems reasonable. postgres=# \df quote_literal List of functions Schema | Name | Result data type | Argument data types | Type ------------+---------------+------------------+---------------------+-------- pg_catalog | quote_literal | text | anyelement | normal pg_catalog | quote_literal | text | text | normal (2 rows) postgres=# \df value_to_json List of functions Schema | Name | Result data type | Argument data types | Type ------------+---------------+------------------+---------------------+-------- pg_catalog | value_to_json | json | anyelement | normal pg_catalog | value_to_json | json | text | normal (2 rows) Revised patch that tweaks the expected result of opr_sanity attached. -- Craig Ringer
Вложения
Craig Ringer <ringerc@ringerc.id.au> writes: > OK, opr_sanity was failing because I added the value_to_json(text) alias > to ensure that: > value_to_json('some_literal') > worked, following the same approach as quote_literal(anyelement) and > quote_literal(text). That should be reasonable, right? No, it isn't. What you're proposing is to let opr_sanity think that text and anyelement are interchangeable to C functions, which is so far from reality as to be ludicrous. That would be seriously damaging to its ability to detect errors. But more to the point, your analogy to quote_literal is faulty anyway. If you looked at that, what you'd find is that only quote_literal(text) is a C function. The other one is a SQL wrapper around a coercion to text followed by the C function. I rather imagine that the definition as you have it would crash on, say, value_to_json(42). regards, tom lane
Craig Ringer <ringerc@ringerc.id.au> writes: > Whenever I try to work with the new json types I trip over the lack of a > function to escape text to json. The attached patch against master > provides one by exposing the existing datum_to_json function to SQL. > ... > This feels basic enough that I'm wondering if there's a reason it wasn't > included from the start, There actually was a previous thread about this: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00001.php Note in particular Andrew's comment: Second, RFC 4627 is absolutely clear: a valid JSON value canonly be an object or an array, so this thing about convertingarbitrarydatum values to JSON is a fantasy. If anything, weshould adjust the JSON input routines to disallow anythingelse,rather than start to output what is not valid JSON. It's possible he's misread the spec, but I think we ought to tread very carefully before adding "obvious" conversions we might regret later. > Please consider this for the 9.2 branch as well as to HEAD, as IMO it's > very important for basic usability of the json functionality. It applies > to 9.2 fine and passes "make check". I know it's late in the game, It's several months too late for feature additions to 9.2, especially ones that would require an initdb to install. We can look at this for 9.3, but I'm still worried about the spec-compliance question. regards, tom lane
On 08/13/2012 12:48 PM, Tom Lane wrote: > There actually was a previous thread about this: > http://archives.postgresql.org/pgsql-hackers/2012-05/msg00001.php > Note in particular Andrew's comment: > > Second, RFC 4627 is absolutely clear: a valid JSON value can > only be an object or an array, so this thing about converting > arbitrary datum values to JSON is a fantasy. If anything, we > should adjust the JSON input routines to disallow anything else, > rather than start to output what is not valid JSON. Thanks for taking a look. That makes sense. I guess these are similar issues to those the XML type faces, where working with fragments is a problem. The spec requires a single root element, but you don't always have that when you're *building* XML, hence the addition of `IS DOCUMENT'. I was hoping to find a low-impact way to allow SQL-level construction of more complex JSON objects with correct text escaping, but it sounds like this isn't the right route. I don't currently see any way to achieve the kind of on-the-fly building you can do with XML's xmlelement(), xmlconcat(), xmlforest() etc; nor equivalent to hstore's hstore(text[],text[]), and I was hoping to improve that. I have a half-finished JSON object constructor json_object_from_arrays(text[], json[]) in the same style as hstore(text[],text[]) . It won't work without the notion of json-typed scalars, though, as the values of keys could then only be arrays or objects, which isn't very useful. I can't usefully accept `anyarray' as a values argument since arrays are of homogeneous type. Accepting text[] would be a bug-magnet even if there was some kind of `text json_escape(text)' function. Would it be reasonable to add a separate json_element type, one that's binary-equivalent to `json' but not constrained by the requirement to be an array or object/dict? Or a `jsobject' ? As for the value_to_json crashing, works for me: postgres=# SELECT value_to_json(42); value_to_json --------------- 42 (1 row) ... since datum_to_json is happy to accept anything you throw at it using output function lookups, and value_to_json its self doesn't care about the argument type at all. That was all in the regression tests. Purely so I understand what the correct handling of the anyelement+text overload would've been: In light of your comments on opr_sanity would the right approach be to add a second C function like text_to_json that only accepts 'text' to avoid confusing the sanity check? So the SQL "value_to_json(anyelement)" would point to the C "value_to_json" and the SQL "value_to_json(text)" would point to the C "text_to_json" ? Anyway, clearly the value_to_json approach is out. -- Craig Ringer
Craig Ringer <ringerc@ringerc.id.au> writes: > As for the value_to_json crashing, works for me: > postgres=# SELECT value_to_json(42); > value_to_json > --------------- > 42 > (1 row) Oh, right, because there actually is support for anyelement in the underlying C function. There is not in the quote_literal case. > Purely so I understand what the correct handling of the anyelement+text > overload would've been: In light of your comments on opr_sanity would > the right approach be to add a second C function like text_to_json that > only accepts 'text' to avoid confusing the sanity check? Actually, given the above, what did you need value_to_json(text) for at all? Wouldn't value_to_json(anyelement) have covered it? But yeah, the general approach to suppressing complaints from that opr_sanity test is to make more C entry points. The point of it, in some sense, is that if you want to make an assumption that two types are binary-equivalent then it's better to have that assumption in C code than embedded in the pg_proc entries. The cases that we let pass via the "expected" outputs are only ones where binary equivalence seems pretty well assured, like text vs varchar. regards, tom lane
On 08/13/2012 01:55 PM, Tom Lane wrote: > Actually, given the above, what did you need value_to_json(text) for at > all? Wouldn't value_to_json(anyelement) have covered it? Usability. Without the version accepting text an explicit cast to text is required to disambiguate a literal argument like value_to_json('something') . > But yeah, the general approach to suppressing complaints from that > opr_sanity test is to make more C entry points. The point of it, > in some sense, is that if you want to make an assumption that two > types are binary-equivalent then it's better to have that assumption > in C code than embedded in the pg_proc entries. The cases that we > let pass via the "expected" outputs are only ones where binary > equivalence seems pretty well assured, like text vs varchar. Thanks. I appreciate the explanation, and sorry for the newbie error. On the JSON stuff, I can see it's not as simple as adding a simple escape function. For my needs during the 9.2 timeframe I'll bundle up an extension with the functionality I need and deal with the need to port to whatever 9.3 includes. Hopefully a "json_value" or "javascript_value" or similar can be introduced for 9.3, given comments like: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00030.php http://archives.postgresql.org/pgsql-hackers/2012-05/msg00065.php .. expressing not only the need for json scalars, but the fact that they're already commonplace in pretty much everything else. Given this: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00040.php it does seem that `json' should be a whole document not a fragment, but IMO a way to work with individual JSON values is going to be *necessary* to get the most out of the json support - and to stop people who use JSON in the real world complaining that Pg's JSON support is broken because it follows the standard not real-world practice. Personally the lack of json scalars has prevented me from using JSON support in two different places already, though it's proving very useful in many others. -- Craig Ringer