Обсуждение: Adding non-selective key to jsonb query @> reduces performance?

Поиск
Список
Период
Сортировка

Adding non-selective key to jsonb query @> reduces performance?

От
Marcin Krupowicz
Дата:
Hi,

I have a table "tbl" with a couple of columns. One of them is "row"
jsonb. It has a GIN index as per below. The table isn't particularly
large, in lower tens of GB. Each "row" has maybe 5-20 keys, nothing
crazy.
Now, when I query it with @> operator I get very different performance
depending on the selection of keys I want to look for. The queries
below return the same result set (just a few rows). I think I have
narrowed the problem down to uniqueness of a given key. For example
this query is fast:

Q1
select count(*) from tbl where row @> '{"SELECTIVE_COL":
"SearchValue", "DATE": "20220606"}'::jsonb
It takes about 0.6ms execution time

However this one, is slow:

Q2
select count(*) from tbl where row @> '{"SELECTIVE_COL":
"SearchValue", "DATE": "20220606", "NON_SELECTIVE_COL": "Abc"}'::jsonb
It takes 17ms

Note that the only difference is adding one more - not very unique -
key. If in Q2 I replaced NON_SELECTIVE_COL with another selective
column, it's becoming fast again.

Here are the query plans:
Q1: https://explain.depesz.com/s/qxU8
Q2: https://explain.depesz.com/s/oIW3
Both look very similar, apart from a very different number of shared
buffers hit.

Index on "row":
"tbl_row_idx" gin ("row" jsonb_path_ops) WITH (fastupdate=off) WHERE
upper_inf(effective_range) AND NOT deleted

PG Version: 14.3, work_mem 512MB

What are my options? Why is the second query so much slower? I changed
Q2 to conjunction of conditions on single columns (row @> '..' and row
@> ...) and it was fast, even with the NON_SELECTIVE_COL included.
Sadly it will be difficult for me do to this in my code without using
dynamic SQL.

Many thanks,
-- Marcin



Re: Adding non-selective key to jsonb query @> reduces performance?

От
Tom Lane
Дата:
Marcin Krupowicz <ma@rcin.me> writes:
> However this one, is slow:

> Q2
> select count(*) from tbl where row @> '{"SELECTIVE_COL":
> "SearchValue", "DATE": "20220606", "NON_SELECTIVE_COL": "Abc"}'::jsonb
> It takes 17ms

> Note that the only difference is adding one more - not very unique -
> key. If in Q2 I replaced NON_SELECTIVE_COL with another selective
> column, it's becoming fast again.

This doesn't surprise me a whole lot based on what I know of GIN.
It's going to store sets of TIDs associated with each key or value
mentioned in the data, and then a query will have to AND the sets
of TIDs for keys/values mentioned in the query.  That will take
longer when some of those sets are big.

It might be worth experimenting with an index built using the
non-default jsonb_path_ops opclass [1].  I'm not sure if that'd be
faster for this scenario, but it seems worth trying.

            regards, tom lane

[1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING