Unable to make use of "deep" JSONB index

Поиск
Список
Период
Сортировка
От Shaheed Haque
Тема Unable to make use of "deep" JSONB index
Дата
Msg-id CAHAc2jdaD63ANRFuQpzZ13XibyP1QPqRUUDHTznRVep48e=7nA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Unable to make use of "deep" JSONB index  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Hi,

I'm having trouble using an index on the "deep" innards of a JSONB field in that the matching query it is trying to accelerate never uses it. I did get some advice on a simplified version of the problem at [1], but the actual problem remains the same in that "EXPLAIN ANALYZE" never refers to the index. Here are the details including a test case below...

1. The JSONB can be several MB in size. This works fine for all but one access pattern.
2. The JSON in the problem use case looks like this:

{
  "...stuff...": ...
  "employee": {
    "999": {"id": 999, "integer attribute": 0, "boolean-may-be-missing": true, "state": {
      "nested-list": [[], [], ...]
    }
  }
}

3. As per the discussion at [2], using a SELECT with a WHERE on the 3 attributes of interest ("integer attribute", the "boolean-may-be-missing" and "nested-list") incurs a significant overheard which suggests that the JSONB storage is being accessed 3 times. In order to optimise for this case, I constructed a query using the jsonpath support which seems to successfully avoid the triple-fetch by keeping the logic inside the jsonpath query like this:

WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 || @.last_run_of_employment
== true || @.state.employment[last][2] == 0)')

4. Then I created an index "matching" this query.
5. According to EXPLAIN ANALYSE, the index is never used.


=== version and platform ===

Version: PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
Platform: Ubuntu 22.04 (Jammy), using OS-supplied build

=== test case ===
CREATE TABLE payrun (
    id serial primary key,
    snapshot JSONB
);

INSERT INTO payrun(snapshot)
VALUES
  ('{"employee": {"999": {"id": 999, "state": {"employment": [["1920-01-01", null, 3]]}, "pay_graph": 0, "last_run_of_employment": false}, "111": {"id": 111, "state": {"employment": [["1920-01-01", null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}'),
  ('{"employee": {"999": {"id": 999, "state": {"employment": [["1970-01-01", null, 3]]}, "pay_graph": 6, "last_run_of_employment": true}, "222": {"id": 222, "state": {"employment": [["1920-01-01", null, 5]]}, "pay_graph": 5, "last_run_of_employment": true}}}'),
  ('{"employee": {"998": {"id": 998, "state": {"employment": [["1980-01-01", null, 3]]}, "pay_graph": 7, "last_run_of_employment": false}, "333": {"id": 333, "state": {"employment": [["1920-01-01", null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}')
;

SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun" WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 || @.last_run_of_employment == true || @.state.employment[last][2] == 0)');

--
-- Create index designed to match the query.
--
create index idx1 on payrun using gin ((snapshot->'$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true || @.state.employment[last][2] == 0)'));

set enable_seqscan = OFF;

--
-- EXPLAIN ANALYZE ...query above...
--
explain analyse SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun" WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 || @.last_run_of_employment
== true || @.state.employment[last][2] == 0)');
                                                                        QUERY PLAN                                                                           
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on payrun  (cost=10000000000.00..10000000001.04 rows=1 width=36) (actual time=0.040..0.042 rows=1 loops=1)
  Filter: (snapshot @? '$."employee"."999"?((@."pay_graph" != 0 || @."last_run_of_employment" == true) || @."state"."employment"[last][2] == 0)'::jsonpath)
  Rows Removed by Filter: 2
Planning Time: 0.883 ms
Execution Time: 0.078 ms
(5 rows)



=== end test case ===

The expected result is that with enable_seqscan = OFF, the index should be used, but instead a sequential scan is reported as above. The same happens without enable_seqscan = OFF on
a moderately large test set of over 2000 rows (with extended run times circa 10+ seconds on my hardware).

I have tried the same with similar results on PG13.

Have I constructed the index incorrectly, or is there some other way to convince the query to use it?

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: How is this possible "publication does not exist"
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Unable to make use of "deep" JSONB index