Обсуждение: JSON Path and GIN Questions

Поиск
Список
Период
Сортировка

JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
Greetings Hackers,

Been a while! I’m working on some experiments with JSONB columns and GIN indexes, and have operated on the assumption
thatJSON Path operations would take advantage of GIN indexes, with json_path_ops as a nice optimization. But I’ve run
intowhat appear to be some inconsistencies and oddities I’m hoping to figure out with your help. 

For the examples in this email, I’m using this simple table:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json| jq -c ".[]" | sed
"s|\\\\|\\\\\\\\|g"';
create index on movies using gin (movie);
analyze movies;

That gives me a simple table with around 3600 rows. Not a lot of data, but hopefully enough to demonstrate the issues.

Issue 1: @@ vs @?
-----------------

I have been confused as to the difference between @@ vs @?: Why do these return different results?

david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")';
 id
----
(0 rows)

david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")';
 id
----
 10
(1 row)

I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), and from the suggestion I got
there,it seems that @@ expects a boolean to be returned by the path query, while @? wraps it in an implicit exists().
Isthat right? 

If so, I’d like to submit a patch to the docs talking about this, and suggesting the use of jsonb_path_query() to test
pathsto see if they return a boolean or not. 


Issue 2: @? Index Use
---------------------

From Oleg’s (happy belated birthday!) notes
(https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#jsonpath-operators):


> Operators @? and @@ are interchangeable:
>
>     js @? '$.a' <=> js @@ 'exists($.a)’
>     js @@ '$.a == 1' <=> js @? '$ ? ($.a == 1)’

For the purposes of the above example, this appears to hold true: if I wrap the path query in exists(), @@ returns a
result:

david=# select id from movies where movie @@ 'exists($ ?(@.title == "New Life Rescue"))';
 id
----
 10
(1 row)

Yay! However, @@ and @? don’t seem to use an index the same way: @@ uses a GIN index while @? does not.

Or, no, fiddling with it again just now, I think I have still been confusing these operators! @@ was using the index
withan an explicit exists(), but @? was not…because I was still using an explicit exists. 

In other words:

* @@ 'exists($ ?($.year == 1944))'  Uses the index
* @? '$ ?(@.year == 1944)'          Uses the index
* @? 'exists($ ?($.year == 1944))'  Does not use the index

That last one presumably doesn’t work, because there is an implicit exists() around the exists(), making it
`exists(exists($?($.year == 1944)))`, which returns true for every row  (true and false both exists)! 🤦🏻‍♂️. 

Anyway, if I have this right, I’d like to flesh out the docs a bit.

Issue 3: Index Use for Comparison
---------------------------------

From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path
querywould be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage
ofthe GIN index, apparently the >= operator cannot: 

david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)';
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1)
   Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
   Rows Removed by Filter: 36081
 Planning Time: 1.864 ms
 Execution Time: 36.338 ms
(5 rows)

Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes of
paths,which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is
thereperhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on
`movie-> 'year'`? 

Thanks your your patience with my questions!

Best,

David


Вложения

Re: JSON Path and GIN Questions

От
Erik Wienhold
Дата:
Hi David,

On 13/09/2023 02:16 CEST David E. Wheeler <david@justatheory.com> wrote:

> CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
> \copy movies(movie) from PROGRAM 'curl -s
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json| jq -c ".[]" | sed
"s|\\\\|\\\\\\\\|g"';
> create index on movies using gin (movie);
> analyze movies;
>
> I have been confused as to the difference between @@ vs @?: Why do these
> return different results?
>
> david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")';
>  id
> ----
> (0 rows)
>
> david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")';
>  id
> ----
>  10
> (1 row)
>
> I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202),
> and from the suggestion I got there, it seems that @@ expects a boolean to be
> returned by the path query, while @? wraps it in an implicit exists(). Is that
> right?

That's also my understanding.  We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1].  Maybe it's useful also.

> If so, I’d like to submit a patch to the docs talking about this, and
> suggesting the use of jsonb_path_query() to test paths to see if they return
> a boolean or not.

+1

[1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com

--
Erik



Re: JSON Path and GIN Questions

От
Erik Rijkers
Дата:
Op 9/13/23 om 03:00 schreef Erik Wienhold:
> Hi David,
> 
> On 13/09/2023 02:16 CEST David E. Wheeler <david@justatheory.com> wrote:
> 
>> CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
>> \copy movies(movie) from PROGRAM 'curl -s
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json| jq -c ".[]" | sed
"s|\\\\|\\\\\\\\|g"';
>> create index on movies using gin (movie);
>> analyze movies;
>>
>> I have been confused as to the difference between @@ vs @?: Why do these
>> return different results?
>>
>> david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")';
>>   id
>> ----
>> (0 rows)
>>
>> david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")';
>>   id
>> ----
>>   10
>> (1 row)
>>
>> I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202),
>> and from the suggestion I got there, it seems that @@ expects a boolean to be
>> returned by the path query, while @? wraps it in an implicit exists(). Is that
>> right?
> 
> That's also my understanding.  We had a discussion about the docs on @@, @?, and
> jsonb_path_query on -general a while back [1].  Maybe it's useful also.
> 
>> If so, I’d like to submit a patch to the docs talking about this, and
>> suggesting the use of jsonb_path_query() to test paths to see if they return
>> a boolean or not.
> 
> +1
> 
> [1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com
> 
> --
> Erik


"All use of json*() functions preclude index usage."

That sentence is missing from the documentation.


Erik Rijkers







Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 13, 2023, at 01:11, Erik Rijkers <er@xs4all.nl> wrote:

> "All use of json*() functions preclude index usage."
>
> That sentence is missing from the documentation.

Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why
wouldthe corresponding functions behave the same? 

D


Вложения

Re: JSON Path and GIN Questions

От
Erik Rijkers
Дата:
p 9/13/23 om 22:01 schreef David E. Wheeler:
> On Sep 13, 2023, at 01:11, Erik Rijkers <er@xs4all.nl> wrote:
> 
>> "All use of json*() functions preclude index usage."
>>
>> That sentence is missing from the documentation.
> 
> Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why
wouldthe corresponding functions behave the same?
 
> 
> D

Sorry, perhaps my reply was a bit off-topic.
But you mentioned perhaps touching the docs and
the not-use-of-index is just so unexpected.
Compare these two statements:

select count(id) from movies where
movie @? '$ ? (@.year == 2023)'
Time: 1.259 ms
   (index used)

select count(id) from movies where
jsonb_path_match(movie, '$.year == 2023');
Time: 17.260 ms
   (no index used - unexpectedly slower)

With these two indexes available:
   using gin (movie);
   using gin (movie jsonb_path_ops);

(REL_15_STABLE; but it's the same in HEAD and
the not-yet-committed SQL/JSON patches.)

Erik Rijkers



Re: JSON Path and GIN Questions

От
Tom Lane
Дата:
Erik Rijkers <er@xs4all.nl> writes:
> p 9/13/23 om 22:01 schreef David E. Wheeler:
>> On Sep 13, 2023, at 01:11, Erik Rijkers <er@xs4all.nl> wrote:
>>> "All use of json*() functions preclude index usage."

>> Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why
wouldthe corresponding functions behave the same? 

> Sorry, perhaps my reply was a bit off-topic.
> But you mentioned perhaps touching the docs and
> the not-use-of-index is just so unexpected.

Unexpected to who?  I think the docs make it pretty plain that only
operators on indexed columns are considered as index qualifications.
Admittedly, 11.2 Index Types [1] makes the point only by not
discussing any other case, but when you get to 11.10 Operator Classes
and Operator Families [2] and discover that the entire index definition
mechanism is based around operators not functions, you should be able
to reach that conclusion.  The point is made even more directly in
38.16 Interfacing Extensions to Indexes [3], though I'll concede
that that's not material I'd expect the average PG user to read.
As far as json in particular is concerned, 8.14.4 jsonb Indexing [4]
is pretty clear about what is or is not supported.

            regards, tom lane

[1] https://www.postgresql.org/docs/current/indexes-types.html
[2] https://www.postgresql.org/docs/current/indexes-opclass.html
[3] https://www.postgresql.org/docs/current/xindex.html
[4] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING



Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 14, 2023, at 00:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> As far as json in particular is concerned, 8.14.4 jsonb Indexing [4]
> is pretty clear about what is or is not supported.

How do you feel about this note, then?

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index b6c2ddbf55..7dda727f0d 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -413,6 +413,13 @@ SELECT doc->'site_name' FROM websites
     Two GIN <quote>operator classes</quote> are provided, offering different
     performance and flexibility trade-offs.
   </para>
+  <note>
+    <para>
+    As with all indexes, only operators on indexed columns are considered as
+    index qualifications. In other words, only <type>jsonb</type> operators can
+    take advantage of GIN indexes; <type>jsonb</type> functions cannot.
+    </para>
+  </note>
   <para>
     The default GIN operator class for <type>jsonb</type> supports queries with
     the key-exists operators <literal>?</literal>, <literal>?|</literal>


Best,

David


Вложения

Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:

> That's also my understanding.  We had a discussion about the docs on @@, @?, and
> jsonb_path_query on -general a while back [1].  Maybe it's useful also.

Okay, I’ll take a pass at expanding the docs on this. I think a little mini-tutorial on these two operators would be
useful.

Meanwhile, I’d like to re-up this question about the index qualification of non-equality JSON Path operators.

On Sep 12, 2023, at 20:16, David E. Wheeler <david@justatheory.com> wrote:

> Issue 3: Index Use for Comparison
> ---------------------------------
>
> From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path
querywould be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage
ofthe GIN index, apparently the >= operator cannot: 
>
> david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)';
>                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
> Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1)
>   Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
>   Rows Removed by Filter: 36081
> Planning Time: 1.864 ms
> Execution Time: 36.338 ms
> (5 rows)
>
> Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes
ofpaths, which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is
thereperhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on
`movie-> 'year'`? 

Thanks,

David


Вложения

Re: JSON Path and GIN Questions

От
Tom Lane
Дата:
"David E. Wheeler" <david@justatheory.com> writes:
> On Sep 14, 2023, at 00:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> As far as json in particular is concerned, 8.14.4 jsonb Indexing [4]
>> is pretty clear about what is or is not supported.

> How do you feel about this note, then?

I think it's unnecessary.  If we did consider it necessary,
why wouldn't just about every subsection in chapter 8 need
similar wording?

            regards, tom lane



Re: JSON Path and GIN Questions

От
Erik Rijkers
Дата:
Op 9/15/23 om 22:27 schreef David E. Wheeler:
> On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:
> 
>> That's also my understanding.  We had a discussion about the docs on @@, @?, and
>> jsonb_path_query on -general a while back [1].  Maybe it's useful also.
> 
> Okay, I’ll take a pass at expanding the docs on this. I think a little mini-tutorial on these two operators would be
useful.
> 
> Meanwhile, I’d like to re-up this question about the index qualification of non-equality JSON Path operators.
> 
> On Sep 12, 2023, at 20:16, David E. Wheeler <david@justatheory.com> wrote:
> 
>> Issue 3: Index Use for Comparison
>> ---------------------------------
>>
>>  From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON
Pathquery would be able to use the GIN index. However while the use of the == JSON Path operator is able to take
advantageof the GIN index, apparently the >= operator cannot:
 
>>
>> david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)';
>>                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------
>> Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1)
>>    Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
>>    Rows Removed by Filter: 36081
>> Planning Time: 1.864 ms
>> Execution Time: 36.338 ms
>> (5 rows)
>>
>> Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes
ofpaths, which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is
thereperhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on
`movie-> 'year'`?
 
> 

movie @? '$ ?($.year >= 2023)'

I believe it is indeed not possible to have such a unequality-search use 
the GIN index.  It is another weakness of JSON that can be unexpected to 
those not in the fullness of Knowledge of the manual. Yes, this too 
would be good to explain in the doc where JSON indexes are explained.

Erik Rijkers

> Thanks,
> 
> David
> 



Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 15, 2023, at 20:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think that that indicates that you're putting the info in the
> wrong place.  Perhaps the right answer is to insert something
> more explicit in section 11.2, which is the first place where
> we really spend any effort discussing what can be indexed.

Fair enough. How ’bout this?

--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -120,7 +120,7 @@ CREATE INDEX test1_id_index ON test1 (id);
    B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension <link
    linkend="bloom">bloom</link>.
    Each index type uses a different
-   algorithm that is best suited to different types of queries.
+   algorithm that is best suited to different types of queries and operators.
    By default, the <link linkend="sql-createindex"><command>CREATE
    INDEX</command></link> command creates
    B-tree indexes, which fit the most common situations.
@@ -132,6 +132,14 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
 </programlisting>
   </para>

+  <note>
+    <para>
+    Only operators on indexed columns are considered as index qualifications.
+    Functions never qualify for index usage, aside from
+    <link linkend="indexes-expressional">indexes on expressions</link>.
+    </para>
+  </note>
+
   <sect2 id="indexes-types-btree">
    <title>B-Tree</title>



Вложения

Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 15, 2023, at 23:59, Erik Rijkers <er@xs4all.nl> wrote:

> movie @? '$ ?($.year >= 2023)'
>
> I believe it is indeed not possible to have such a unequality-search use the GIN index.  It is another weakness of
JSONthat can be unexpected to those not in the fullness of Knowledge of the manual. Yes, this too would be good to
explainin the doc where JSON indexes are explained. 

Is that a limitation of GIN indexes in general? Or could there be opclass improvements in the future that would enable
suchcomparisons? 

Thanks,

David


Вложения

Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:

>> If so, I’d like to submit a patch to the docs talking about this, and
>> suggesting the use of jsonb_path_query() to test paths to see if they return
>> a boolean or not.
>
> +1

I’ve started work on this; there’s so much to learn! Here’s a new example that surprised me a bit. Using the GPS
trackerexample from the docs [1] loaded into a `:json` psql variable, this output of this query makes perfect sense to
me:

david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 14)');
 jsonb_path_query
------------------
 13.4034
 13.2635

Because `[*]` selects all the values. This, however, I did not expect:

david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 14)');
 jsonb_path_query
------------------
 13.4034
 13.2635
(2 rows)

I had expected it to return two single-value arrays, instead:

 [13.4034]
 [13.2635]

It appears that the filter expression is doing some sub-selection, too. Is that expected?

Best,

David

  [1]: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH

Вложения

Re: JSON Path and GIN Questions

От
Erik Wienhold
Дата:
On 16/09/2023 22:19 CEST David E. Wheeler <david@justatheory.com> wrote:

> On Sep 15, 2023, at 23:59, Erik Rijkers <er@xs4all.nl> wrote:
>
> > movie @? '$ ?($.year >= 2023)'
> >
> > I believe it is indeed not possible to have such a unequality-search use
> > the GIN index.  It is another weakness of JSON that can be unexpected to
> > those not in the fullness of Knowledge of the manual. Yes, this too would
> > be good to explain in the doc where JSON indexes are explained.
>
> Is that a limitation of GIN indexes in general? Or could there be opclass
> improvements in the future that would enable such comparisons?

This detail is mentioned in docs [1]:

"For these operators, a GIN index extracts clauses of the form
 **accessors_chain = constant** out of the jsonpath pattern, and does the
 index search based on the keys and values mentioned in these clauses."

I don't know if this is a general limitation of GIN indexes or just how these
operators are implemented right now.

[1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING

--
Erik



Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 16, 2023, at 16:50, Erik Wienhold <ewie@ewie.name> wrote:

> "For these operators, a GIN index extracts clauses of the form
> **accessors_chain = constant** out of the jsonpath pattern, and does the
> index search based on the keys and values mentioned in these clauses."
>
> I don't know if this is a general limitation of GIN indexes or just how these
> operators are implemented right now.
>
> [1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING


The detail that jumps out at me is this one on jsonb_path_ops:

“Basically, each jsonb_path_ops index item is a hash of the value and the key(s) leading to it”

Because jsonb_path_ops indexes hashes, I would assume it would only support path equality. But it’s not clear to me
fromthese docs that jsonb_ops also indexes hashes. Does it? 

Best,

D


Вложения

Re: JSON Path and GIN Questions

От
Erik Wienhold
Дата:
On 16/09/2023 22:26 CEST David E. Wheeler <david@justatheory.com> wrote:

> I’ve started work on this; there’s so much to learn! Here’s a new example
> that surprised me a bit. Using the GPS tracker example from the docs [1]
> loaded into a `:json` psql variable, this output of this query makes perfect
> sense to me:
>
> david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 14)');
>  jsonb_path_query
> ------------------
>  13.4034
>  13.2635
>
> Because `[*]` selects all the values. This, however, I did not expect:
>
> david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 14)');
>  jsonb_path_query
> ------------------
>  13.4034
>  13.2635
> (2 rows)
>
> I had expected it to return two single-value arrays, instead:
>
>  [13.4034]
>  [13.2635]
>
> It appears that the filter expression is doing some sub-selection, too.
> Is that expected?

Looks like the effect of lax mode which may unwrap arrays when necessary [1].
The array unwrapping looks like the result of jsonb_array_elements().

It kinda works in strict mode:

    SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] < 14)');

       jsonb_path_query
    -----------------------
     [47.763, 13.4034]
     [47.706, 13.2635]
    (2 rows)

But it does not remove elements from the matching arrays.  Which I don't even
expect here because the path specifies the location array as the object to be
returned.  The filter expression then only decides whether to return the
location array or not.  Nowhere in the docs does it say that the filter
expression itself removes any elements from a matched array.

Here's a query that filter's out individual array elements.  It's quite a
mouthful (especially to preserve the order of array elements):

    WITH location AS (
      SELECT loc, row_number() OVER () AS array_num
      FROM jsonb_path_query(:'json', 'strict $.track.segments[*].location') loc
    ),
    element AS (
      SELECT array_num, e.num AS elem_num, e.elem
      FROM location
        CROSS JOIN jsonb_array_elements(loc) WITH ORDINALITY AS e (elem, num)
    )
    SELECT jsonb_agg(elem ORDER BY elem_num)
    FROM element
    WHERE jsonb_path_exists(elem, '$ ? (@ < 14)')
    GROUP BY array_num;

       jsonb_agg
    ---------------
     [13.2635]
     [13.4034]
    (2 rows)

[1] https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES

--
Erik



Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 16, 2023, at 18:13, Erik Wienhold <ewie@ewie.name> wrote:

> Looks like the effect of lax mode which may unwrap arrays when necessary [1].
> The array unwrapping looks like the result of jsonb_array_elements().
>
> It kinda works in strict mode:
>
> SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] < 14)');
>
>    jsonb_path_query
> -----------------------
>  [47.763, 13.4034]
>  [47.706, 13.2635]
> (2 rows)
>
> But it does not remove elements from the matching arrays.  Which I don't even
> expect here because the path specifies the location array as the object to be
> returned.  The filter expression then only decides whether to return the
> location array or not.  Nowhere in the docs does it say that the filter
> expression itself removes any elements from a matched array.

Yes, this is what I expected. It means “select the location array if any of its contents is less that 14.”

I don’t understand why it’s different in lax mode, though, as `@[*]` is not a structural error; it confirms to the
schema,as the docs say. The flattening in this case seems weird. 

Ah, here’s why:, from the docs:

"Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSON arrays
out-of-the-box.”

There follow some discussion of the need to specify `[*]` on segments in strict mode, but since that’s exactly what my
exampledoes (and the same for the locations array inside the filter), it doesn’t seem right to me that it would be
unwrappedhere. 

> Here's a query that filter's out individual array elements.  It's quite a
> mouthful (especially to preserve the order of array elements):

Wow fun, and yeah, it makes sense to take things apart in SQL for this sort of thing!

Best,

David


Вложения

Re: JSON Path and GIN Questions

От
Tom Lane
Дата:
"David E. Wheeler" <david@justatheory.com> writes:
> On Sep 15, 2023, at 20:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think that that indicates that you're putting the info in the
>> wrong place.  Perhaps the right answer is to insert something
>> more explicit in section 11.2, which is the first place where
>> we really spend any effort discussing what can be indexed.

> Fair enough. How ’bout this?

After thinking about it for awhile, I think we need some more
discursive explanation of what's allowed, perhaps along the lines
of the attached.  (I still can't shake the feeling that this is
duplicative; but I can't find anything comparable until you get
into the weeds in Section V.)

I put the new text at the end of section 11.1, but perhaps it
belongs a little further up in that section; it seems more
important than some of the preceding paras.

            regards, tom lane

diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 55122129d5..1a0b003fb0 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -109,6 +109,39 @@ CREATE INDEX test1_id_index ON test1 (id);
    Therefore indexes that are seldom or never used in queries
    should be removed.
   </para>
+
+  <para>
+   In general, <productname>PostgreSQL</productname> indexes can be used
+   to optimize queries that contain one or more <literal>WHERE</literal>
+   or <literal>JOIN</literal> clauses of the form
+
+<synopsis>
+<replaceable>indexed-column</replaceable> <replaceable>indexable-operator</replaceable>
<replaceable>comparison-value</replaceable>
+</synopsis>
+
+   Here, the <replaceable>indexed-column</replaceable> is whatever
+   column or expression the index has been defined on.
+   The <replaceable>indexable-operator</replaceable> is an operator that
+   is a member of the index's <firstterm>operator class</firstterm> for
+   the indexed column.  (More details about that appear below.)
+   And the <replaceable>comparison-value</replaceable> can be any
+   expression that is not volatile and does not reference the index's
+   table.
+  </para>
+
+  <para>
+   In some cases the query planner can extract an indexable clause of
+   this form from another SQL construct.  A simple example is that if
+   the original clause was
+
+<synopsis>
+<replaceable>comparison-value</replaceable> <replaceable>operator</replaceable>
<replaceable>indexed-column</replaceable>
+</synopsis>
+
+   then it can be flipped around into indexable form if the
+   original <replaceable>operator</replaceable> has a commutator
+   operator that is a member of the index's operator class.
+  </para>
  </sect1>


@@ -120,7 +153,7 @@ CREATE INDEX test1_id_index ON test1 (id);
    B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension <link
    linkend="bloom">bloom</link>.
    Each index type uses a different
-   algorithm that is best suited to different types of queries.
+   algorithm that is best suited to different types of indexable clauses.
    By default, the <link linkend="sql-createindex"><command>CREATE
    INDEX</command></link> command creates
    B-tree indexes, which fit the most common situations.

Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 17, 2023, at 12:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> After thinking about it for awhile, I think we need some more
> discursive explanation of what's allowed, perhaps along the lines
> of the attached.  (I still can't shake the feeling that this is
> duplicative; but I can't find anything comparable until you get
> into the weeds in Section V.)
>
> I put the new text at the end of section 11.1, but perhaps it
> belongs a little further up in that section; it seems more
> important than some of the preceding paras.

I think this is useful, but also that it’s worth calling out explicitly that functions do not count as indexable
operators.True by definition, of course, but I at least had assumed that since an operator is, in a sense, syntax sugar
fora function call, they are in some sense the same thing. 

A header might be useful, something like “What Counts as an indexable expression”.

Best,

David


Вложения

Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:

>> I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202),
>> and from the suggestion I got there, it seems that @@ expects a boolean to be
>> returned by the path query, while @? wraps it in an implicit exists(). Is that
>> right?
>
> That's also my understanding.  We had a discussion about the docs on @@, @?, and
> jsonb_path_query on -general a while back [1].  Maybe it's useful also.

Hi, finally getting back to this, still fiddling to figure out the differences. From the thread you reference [1], is
thepoint that @@ and jsonb_path_match() can only be properly used with a JSON Path expression that’s a predicate check? 

If so, as far as I can tell, only exists() around the entire path query, or the deviation from the SQL standard that
allowsan expression to be a predicate? 

This suggest to me that the "Only the first item of the result is taken into account” bit from the docs may not be
quiteright. Consider this example: 

david=#  select jsonb_path_query('{"a":[false,true,false]}',  '$.a ?(@[*] == false)');
 jsonb_path_query
------------------
 false
 false
(2 rows)

david=#  select jsonb_path_match('{"a":[false,true,false]}',  '$.a ?(@[*] == false)');
ERROR:  single boolean result is expected

jsonb_path_match(), it turns out, only wants a single result. But furthermore perhaps the use of a filter predicate
ratherthan a predicate expression for the entire path query is an error? 

Curiously, @@ seems okay with it:

david=#  select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)';
 ?column?
----------
 t

Not a predicate query, and somehow returns true even though the first item of the result is false? Is that how it
shouldbe? 

Best,

David

[1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com



Re: JSON Path and GIN Questions

От
Erik Wienhold
Дата:
On 2023-10-09 01:13 +0200, David E. Wheeler write:
> On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:
> 
> >> I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202),
> >> and from the suggestion I got there, it seems that @@ expects a boolean to be
> >> returned by the path query, while @? wraps it in an implicit exists(). Is that
> >> right?
> > 
> > That's also my understanding.  We had a discussion about the docs on @@, @?, and
> > jsonb_path_query on -general a while back [1].  Maybe it's useful also.
> 
> Hi, finally getting back to this, still fiddling to figure out the
> differences. From the thread you reference [1], is the point that @@
> and jsonb_path_match() can only be properly used with a JSON Path
> expression that’s a predicate check?

I think so.  That's also supported by the existing docs which only
mention "JSON path predicate" for @@ and jsonb_path_match().

> If so, as far as I can tell, only exists() around the entire path
> query, or the deviation from the SQL standard that allows an
> expression to be a predicate?

Looks like that.  But note that exists() is also a filter expression.
So wrapping the entire jsonpath in exists() is also a deviation from the
SQL standard which only allows predicates in filter expressions, i.e.
'<path> ? (<predicate>)'.

> This suggest to me that the "Only the first item of the result is
> taken into account” bit from the docs may not be quite right.

Yes, this was also the issue in the referenced thread[1].  I think my
suggesstion in [2] explains it (as far as I understand it).

> Consider this example:
> 
> david=#  select jsonb_path_query('{"a":[false,true,false]}',  '$.a ?(@[*] == false)');
>  jsonb_path_query
> ------------------
>  false
>  false
> (2 rows)
> 
> david=#  select jsonb_path_match('{"a":[false,true,false]}',  '$.a ?(@[*] == false)');
> ERROR:  single boolean result is expected
> 
> jsonb_path_match(), it turns out, only wants a single result. But
> furthermore perhaps the use of a filter predicate rather than a
> predicate expression for the entire path query is an error?

Yes, I think @@ and jsonb_path_match() should not be used with filter
expressions because the jsonpath returns whatever the path expression
yields (which may be an actual boolean value in the jsonb).  The filter
expression only filters (as the name suggests) what the path expression
yields.

> Curiously, @@ seems okay with it:
> 
> david=#  select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)';
>  ?column? 
> ----------
>  t
> 
> Not a predicate query, and somehow returns true even though the first
> item of the result is false? Is that how it should be?

Your example does a text search equivalent to:

    select to_tsvector('{"a":[false,true,false]}') @@ plainto_tsquery('$.a ? (@[*] == true)')

You forgot the cast to jsonb.  jsonb @@ jsonpath actually returns null:

    test=# select '{"a":[false,true,false]}'::jsonb @@ '$.a ? (@[*] == false)';
     ?column?
    ----------
     <null>
    (1 row)

This matches the note right after the docs for @@:

"The jsonpath operators @? and @@ suppress the following errors: missing
 object field or array element, unexpected JSON item type, datetime and
 numeric errors. The jsonpath-related functions described below can also
 be told to suppress these types of errors. This behavior might be
 helpful when searching JSON document collections of varying structure."

That would be the silent argument of jsonb_path_match():

    test=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ? (@[*] == false)', silent => true);
     jsonb_path_match 
    ------------------
     <null>
    (1 row)

[1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/880194083.579916.1680598906819%40office.mailbox.org

-- 
Erik



Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
Thanks for the reply, Erik. Have appreciated collaborating with you on a few different things lately!

> On Oct 13, 2023, at 22:50, Erik Wienhold <ewie@ewie.name> wrote:

>> Hi, finally getting back to this, still fiddling to figure out the
>> differences. From the thread you reference [1], is the point that @@
>> and jsonb_path_match() can only be properly used with a JSON Path
>> expression that’s a predicate check?
>
> I think so.  That's also supported by the existing docs which only
> mention "JSON path predicate" for @@ and jsonb_path_match().

Okay, good.

>> If so, as far as I can tell, only exists() around the entire path
>> query, or the deviation from the SQL standard that allows an
>> expression to be a predicate?
>
> Looks like that.  But note that exists() is also a filter expression.
> So wrapping the entire jsonpath in exists() is also a deviation from the
> SQL standard which only allows predicates in filter expressions, i.e.
> '<path> ? (<predicate>)'.

Yeah. I’m starting to get the sense that the Postgres extension of the standard to allow predicates without filters is
almosta different thing, like there are two Pg SQL/JSON Path languages: 

1. SQL Standard path language for selecting values and includes predicates. Returns the selected value(s). Supported by
`@?`and jsonb_path_exists(). 

2. The Postgres predicate path language which returns a boolean, akin to a WHERE expression. Supported by `@@` and
jsonb_path_match()

Both are supported by jsonb_path_query(), but if you use a standard path you get the values and if you use a predicate
pathyou get a boolean. This feels a big overloaded to me, TBH; I find myself wanting them to be separate types since
thebehaviors vary quite a bit! 

>> This suggest to me that the "Only the first item of the result is
>> taken into account” bit from the docs may not be quite right.
>
> Yes, this was also the issue in the referenced thread[1].  I think my
> suggesstion in [2] explains it (as far as I understand it).

Yeah, lax vs. strict mode stuff definitely creates some added complexity. I see now I missed the rest of that thread;
seeingthe entire thread on one page[1] really helps. I’d like to take a stab at the doc improvements Tom suggests[2]. 

>> jsonb_path_match(), it turns out, only wants a single result. But
>> furthermore perhaps the use of a filter predicate rather than a
>> predicate expression for the entire path query is an error?
>
> Yes, I think @@ and jsonb_path_match() should not be used with filter
> expressions because the jsonpath returns whatever the path expression
> yields (which may be an actual boolean value in the jsonb).  The filter
> expression only filters (as the name suggests) what the path expression
> yields.

Agreed. It only gets worse with a filter expression that selects a single value:

david=# select jsonb_path_match('{"a":[false,true]}',  '$.a ?(@[*] == false)');
 jsonb_path_match
------------------
 f

Presumably it returns false because the value selected is JSON `false`:

david=# select jsonb_path_query('{"a":[false,true]}',  '$.a ?(@[*] == false)');
 jsonb_path_query
------------------
 false

Which seems misleading, frankly. Would it be possible to update jsonb_path_match and @@ to raise an error when the path
expressionis not a predicate? 


>> Curiously, @@ seems okay with it:
>>
>> david=#  select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)';
>> ?column?
>> ----------
>> t
>>
>> Not a predicate query, and somehow returns true even though the first
>> item of the result is false? Is that how it should be?
>
> Your example does a text search equivalent to:
>
> select to_tsvector('{"a":[false,true,false]}') @@ plainto_tsquery('$.a ? (@[*] == true)')
>
> You forgot the cast to jsonb.

Oh good grief 🤦🏻‍♂️

> jsonb @@ jsonpath actually returns null:
>
> test=# select '{"a":[false,true,false]}'::jsonb @@ '$.a ? (@[*] == false)';
>  ?column?
> ----------
>  <null>
> (1 row)

Yes, much better, though see the result above that returns a single `false` and confuses things.

> This matches the note right after the docs for @@:

Yeah, that makes sense. But here’s a bit about lax mode[3] that confuses me:

> The lax mode facilitates matching of a JSON document structure and path expression if the JSON data does not conform
tothe expected schema. If an operand does not match the requirements of a particular operation, it can be automatically
wrappedas an SQL/JSON array or unwrapped by converting its elements into an SQL/JSON sequence before performing this
operation.Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare
SQL/JSONarrays out-of-the-box. 

This automatic flattening in lax mode seems odd, because it means you get different results in strict and lax mode
wherethere are no errors. In lax mode, you get a set: 

david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
jsonb_path_query
------------------
3
4
5
(3 rows)

But in strict mode, you get the array selected by `$.a`, which is more what I would expect:

david=# select jsonb_path_query('{"a":[1,2,3,4,5]}',  'strict $.a ?(@[*] > 2)');
 jsonb_path_query
------------------
 [1, 2, 3, 4, 5]

This seems like an odd inconsistency in return values, but perhaps the standard calls for this? I don’t have access to
it,but MSSQL docs[4], at least, say: 

> * In **lax** mode, the function returns empty values if the path expression contains an error. For example, if you
requestthe value **$.name**, and the JSON text doesn't contain a **name** key, the function returns null, but does not
raisean error. 
>
> * In **strict** mode, the function raises an error if the path expression contains an error.

No flattening, only error suppression. The Oracle docs[5] mention array flattening, but I don’t have it up and running
tosee if that means query *results* are flattened. 

Best,

David


[1] https://www.postgresql.org/message-id/flat/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com

[2] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us

[3] https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES

[4]
https://learn.microsoft.com/en-us/sql/relational-databases/json/json-path-expressions-sql-server?view=sql-server-ver16#PATHMODE

[5]
https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/json-path-expressions.html#GUID-8656CAB9-C293-4A99-BB62-F38F3CFC4C13


Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Sep 17, 2023, at 18:09, David E. Wheeler <david@justatheory.com> wrote:

> I think this is useful, but also that it’s worth calling out explicitly that functions do not count as indexable
operators.True by definition, of course, but I at least had assumed that since an operator is, in a sense, syntax sugar
fora function call, they are in some sense the same thing. 
>
> A header might be useful, something like “What Counts as an indexable expression”.

Hey Tom, are you still thinking about adding this bit to the docs? I took a quick look at master and didn’t see it
there.

Thanks,

David




Re: JSON Path and GIN Questions

От
Tom Lane
Дата:
"David E. Wheeler" <david@justatheory.com> writes:
> Hey Tom, are you still thinking about adding this bit to the docs? I took a quick look at master and didn’t see it
there.

I'd waited because the discussion was still active, and then it
kind of slipped off the radar.  I'll take another look and push
some form of what I suggested.  That doesn't really address the
jsonpath oddities you were on about, though.

            regards, tom lane



Re: JSON Path and GIN Questions

От
"David E. Wheeler"
Дата:
On Dec 17, 2023, at 16:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I'd waited because the discussion was still active, and then it
> kind of slipped off the radar.  I'll take another look and push
> some form of what I suggested.

Right on.

> That doesn't really address the
> jsonpath oddities you were on about, though.

No, I attempted to address those in [a patch][1].

  [1]: https://commitfest.postgresql.org/45/4624/

Best,

David