Обсуждение: BUG #16828: duplicate results when using ** recursive expression in JSON path

Поиск
Список
Период
Сортировка

BUG #16828: duplicate results when using ** recursive expression in JSON path

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16828
Logged by:          Thomas Kellerer
Email address:      shammat@gmx.net
PostgreSQL version: 13.1
Operating system:   Windows 10, CentOS 8
Description:

The following query 

  select jsonb_path_query_array(col, '$.**.itemName')
  from (
    values ('{"items": [{"itemName": "a", "items": [{"itemName":
"b"}]}]}'::jsonb)
  ) as t(col) 

returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value
only appears once in the JSON value. 

This also happens on Postgres 12.x


Re: BUG #16828: duplicate results when using ** recursive expression in JSON path

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> The following query 

>   select jsonb_path_query_array(col, '$.**.itemName')
>   from (
>     values ('{"items": [{"itemName": "a", "items": [{"itemName":
> "b"}]}]}'::jsonb)
>   ) as t(col) 

> returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value
> only appears once in the JSON value. 

jsonpath_exec.c is a muddle of undocumented code, but I think
I found where the duplication is occurring: in executeAnyItem(),
the first occurrence comes out while recursing down from the
executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then
the duplicate comes out while recursing down from the executeAnyItem
call at line 1439.  So I'd say that that logic for
"ignoreStructuralErrors" needs a rethink.  (BTW, why is half of this
code relying on cxt->ignoreStructuralErrors while the other half
receives a passed-down flag?  That seems like a recipe for bugs,
especially with the lack of commentary about the reason for it.)

Alexander, git blame says all this code is your fault ...

            regards, tom lane



Re: BUG #16828: duplicate results when using ** recursive expression in JSON path

От
Alexander Korotkov
Дата:
On Sun, Jan 17, 2021 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
> > The following query
>
> >   select jsonb_path_query_array(col, '$.**.itemName')
> >   from (
> >     values ('{"items": [{"itemName": "a", "items": [{"itemName":
> > "b"}]}]}'::jsonb)
> >   ) as t(col)
>
> > returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value
> > only appears once in the JSON value.
>
> jsonpath_exec.c is a muddle of undocumented code, but I think
> I found where the duplication is occurring: in executeAnyItem(),
> the first occurrence comes out while recursing down from the
> executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then
> the duplicate comes out while recursing down from the executeAnyItem
> call at line 1439.  So I'd say that that logic for
> "ignoreStructuralErrors" needs a rethink.  (BTW, why is half of this
> code relying on cxt->ignoreStructuralErrors while the other half
> receives a passed-down flag?  That seems like a recipe for bugs,
> especially with the lack of commentary about the reason for it.)

I think that's not just a bug in the code, but a high-level problem.

** operator enumerates all the subdocuments.

# select * from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**');
                       jsonb_path_query
--------------------------------------------------------------
 {"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]}
 [{"items": [{"itemName": "b"}], "itemName": "a"}]
 {"items": [{"itemName": "b"}], "itemName": "a"}
 [{"itemName": "b"}]
 {"itemName": "b"}
 "b"
 "a"
(7 rows)

After that, .itemName accessor is applied.  But in the lax mode this accessor automatically unwraps the arrays.

# select subres, jsonb_path_query_array(subres, '$.itemName') from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**') subres;
                            subres                            | jsonb_path_query_array
--------------------------------------------------------------+------------------------
 {"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]} | []
 [{"items": [{"itemName": "b"}], "itemName": "a"}]            | ["a"]
 {"items": [{"itemName": "b"}], "itemName": "a"}              | ["a"]
 [{"itemName": "b"}]                                          | ["b"]
 {"itemName": "b"}                                            | ["b"]
 "b"                                                          | []
 "a"                                                          | []
(7 rows)

So, everything works as designed, but the design is probably wrong.  The issue here is that ** is our extension of the standard, and it works strangely in lax mode.  But in strict mode everything looks OK.

#  select jsonb_path_query_array('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, 'strict $.**.itemName');
 jsonb_path_query_array
------------------------
 ["a", "b"]
(1 row)

Should we consider everything that comes after ** always in strict mode?

> Alexander, git blame says all this code is your fault ...

I do agree all faults in this code are mine.  I do agree there are faults in this code.  But I don't think *all* this code is fault :)

I'll plan my time to improve the clarity of this code.

------
Regards,
Alexander Korotkov