Обсуждение: JSONB index not in use, but is TOAST the real cause of slow query?

Поиск
Список
Период
Сортировка

JSONB index not in use, but is TOAST the real cause of slow query?

От
Shaheed Haque
Дата:
Hi,

I have a database table with a modest number of rows (<1000) but where
one column in the table is a JSONB "snapshot" which can be a few MB in
size. Generally, this is a great fit for the read-write access
patterns involved, but there is a read-query which is VERY slow.

I've searched via Google and in this mailing list archive and found
lots of material, but nothing that explains what is going on...or how
to fix it. The table columns look like this:

    ...a primary key...
    ...various other keys including a couple of FKs...
    snapshot JSONB

and before I did anything, the indices looked like this:

   "paiyroll_payrun_pkey" PRIMARY KEY, btree (process_ptr_id)
   "paiyroll_payrun_company_id_ce341888" btree (company_id)
   "paiyroll_payrun_schedule_id_1593f55f" btree (schedule_id)

The data in one row's "snapshot" looks a bit like this:

{
    "stuff": {},
    "more other stuff": {},
    "employee": {
         "1234": {},
         "56789": {},
    }
}

The query that is slow can be approximated like this:

   SELECT snapshot ->'stuff'->'item' FROM paiyroll_payrun WHERE
snapshot ->'employee' ? '2209';

When I add this index:

    CREATE INDEX idx1 ON paiyroll_payrun USING gin ((snapshot ->'employee'));

the analyser says this:

======================
foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM
paiyroll_payrun WHERE snapshot ->'employee' ? '2209';
                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on paiyroll_payrun  (cost=0.00..29.13 rows=9 width=32)
(actual time=50.185..2520.983 rows=104 loops=1)
  Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
  Rows Removed by Filter: 835
Planning Time: 0.075 ms
Execution Time: 2521.004 ms
(5 rows)
======================

So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to
presume the INDEX is not used because of the number of rows? Is there
a way to verify that? And how can I understand the dreadful amount of
time (of course, this is just on my dev machine, but still...)? Is
there a way to see/tweak what TOAST costs or indeed to confirm if it
is even in use?

Any help appreciated.

Thanks, Shaheed



Re: JSONB index not in use, but is TOAST the real cause of slow query?

От
Tom Lane
Дата:
Shaheed Haque <shaheedhaque@gmail.com> writes:
> ======================
> foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM
> paiyroll_payrun WHERE snapshot ->'employee' ? '2209';
>                                                   QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
> Seq Scan on paiyroll_payrun  (cost=0.00..29.13 rows=9 width=32)
> (actual time=50.185..2520.983 rows=104 loops=1)
>   Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
>   Rows Removed by Filter: 835
> Planning Time: 0.075 ms
> Execution Time: 2521.004 ms
> (5 rows)
> ======================

> So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to
> presume the INDEX is not used because of the number of rows? Is there
> a way to verify that?

You could do "set enable_seqscan = off" and see if the EXPLAIN
results change.  My guess is that you'll find that the indexscan
alternative is costed at a bit more than 29.13 units and thus
the planner thinks seqscan is cheaper.

> And how can I understand the dreadful amount of
> time (of course, this is just on my dev machine, but still...)?

In the seqscan case, the -> operator is going to retrieve the whole
JSONB value from each row, which of course is pretty darn expensive
if it's a few megabytes.  Unfortunately the planner doesn't account
for detoasting costs when making such estimates, so it doesn't
realize that the seqscan case is going to be expensive.  (Fixing
that has been on the to-do list for a long time, but we seldom
see cases where it matters this much, so it hasn't gotten done.)

The problem would likely go away by itself if your table had more
than a few hundred rows, but if you don't anticipate that happening
then you need some sort of band-aid.  I don't recommend turning
enable_seqscan off as a production fix; it'd likely have negative
effects on other queries.  Personally I'd experiment with reducing
random_page_cost a bit to see if I could encourage use of the index
that way.  The default value of 4.0 is tuned for spinning-rust
storage and is not too appropriate for a lot of modern hardware,
so there's probably room to fix it that way without detuning your
setup for other queries.

You should probably also rethink whether you really want to store
your data in this format, because anything at all that you do with
that big JSONB column is going to be expensive.  (Another thing
that's been on the to-do list for awhile is enabling partial
retrieval of large JSONB values, but AFAIK that hasn't happened
yet either.)

            regards, tom lane



Re: JSONB index not in use, but is TOAST the real cause of slow query?

От
Shaheed Haque
Дата:
Tom,

Thanks for the considered advice and insights. My takeaway is that
based on what I've said,you are mostly unsurprised by the results I
see. In the longer term, the number of rows will increase but I will
have to ponder options for the immediate future. I'll have a play with
the knobs you suggested and will report back with anything of note.

One last thought about TOAST. If the cost of the -> retrieving the
data cannot be obviated, is there any way to tweak how that works?

Thanks, Shaheed

On Sat, 28 May 2022 at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Shaheed Haque <shaheedhaque@gmail.com> writes:
> > ======================
> > foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM
> > paiyroll_payrun WHERE snapshot ->'employee' ? '2209';
> >                                                   QUERY PLAN
> > -----------------------------------------------------------------------------------------------------------------
> > Seq Scan on paiyroll_payrun  (cost=0.00..29.13 rows=9 width=32)
> > (actual time=50.185..2520.983 rows=104 loops=1)
> >   Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
> >   Rows Removed by Filter: 835
> > Planning Time: 0.075 ms
> > Execution Time: 2521.004 ms
> > (5 rows)
> > ======================
>
> > So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to
> > presume the INDEX is not used because of the number of rows? Is there
> > a way to verify that?
>
> You could do "set enable_seqscan = off" and see if the EXPLAIN
> results change.  My guess is that you'll find that the indexscan
> alternative is costed at a bit more than 29.13 units and thus
> the planner thinks seqscan is cheaper.
>
> > And how can I understand the dreadful amount of
> > time (of course, this is just on my dev machine, but still...)?
>
> In the seqscan case, the -> operator is going to retrieve the whole
> JSONB value from each row, which of course is pretty darn expensive
> if it's a few megabytes.  Unfortunately the planner doesn't account
> for detoasting costs when making such estimates, so it doesn't
> realize that the seqscan case is going to be expensive.  (Fixing
> that has been on the to-do list for a long time, but we seldom
> see cases where it matters this much, so it hasn't gotten done.)
>
> The problem would likely go away by itself if your table had more
> than a few hundred rows, but if you don't anticipate that happening
> then you need some sort of band-aid.  I don't recommend turning
> enable_seqscan off as a production fix; it'd likely have negative
> effects on other queries.  Personally I'd experiment with reducing
> random_page_cost a bit to see if I could encourage use of the index
> that way.  The default value of 4.0 is tuned for spinning-rust
> storage and is not too appropriate for a lot of modern hardware,
> so there's probably room to fix it that way without detuning your
> setup for other queries.
>
> You should probably also rethink whether you really want to store
> your data in this format, because anything at all that you do with
> that big JSONB column is going to be expensive.  (Another thing
> that's been on the to-do list for awhile is enabling partial
> retrieval of large JSONB values, but AFAIK that hasn't happened
> yet either.)
>
>                         regards, tom lane



Re: JSONB index not in use, but is TOAST the real cause of slow query?

От
Adrian Klaver
Дата:
On 5/28/22 12:38, Shaheed Haque wrote:
> Tom,
> 
> Thanks for the considered advice and insights. My takeaway is that
> based on what I've said,you are mostly unsurprised by the results I
> see. In the longer term, the number of rows will increase but I will
> have to ponder options for the immediate future. I'll have a play with
> the knobs you suggested and will report back with anything of note.
> 
> One last thought about TOAST. If the cost of the -> retrieving the
> data cannot be obviated, is there any way to tweak how that works?

See:

https://www.postgresql.org/docs/current/storage-toast.html

and from there:

https://www.postgresql.org/docs/current/sql-altertable.html

Search for SET STORAGE.

It might help with(from Tom Lane):

"Unfortunately the planner doesn't account
for detoasting costs when making such estimates, ..."

but probably not much with:

" ...because anything at all that you do with
that big JSONB column is going to be expensive.  (Another thing
that's been on the to-do list for awhile is enabling partial
retrieval of large JSONB values, but AFAIK that hasn't happened
yet either.)

> 
> Thanks, Shaheed
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: JSONB index not in use, but is TOAST the real cause of slow query?

От
Tom Lane
Дата:
Shaheed Haque <shaheedhaque@gmail.com> writes:
> One last thought about TOAST. If the cost of the -> retrieving the
> data cannot be obviated, is there any way to tweak how that works?

The only thing that's readily tweakable is to disable data compression
for the out-of-line values (see ALTER TABLE ... SET STORAGE, and note
that you have to rebuild the table for already-stored values to be
changed).  It seems unlikely that that will help you much though,
since doing that would save CPU at the cost of more disk I/O, and
it seems that the I/O side is your bottleneck already.  It *would*
help if jsonb had logic for partial fetches, because that'd require
uncompressed storage to work.  But AFAICS that's not there yet.
I distinctly recall that that was planned for when we were designing
jsonb's on-disk layout, but I see no code actually doing anything
of that sort ... the -> operator certainly doesn't.

            regards, tom lane



Re: JSONB index not in use, but is TOAST the real cause of slow query?

От
Shaheed Haque
Дата:
Adrian, Tom, thanks for the input. Based on that, it occurred to me to
create some "dummy" rows and - almost magically - the index kicks in!
Before is 2500ms:

============
Seq Scan on paiyroll_payrun  (cost=0.00..52.43 rows=17 width=32)
(actual time=53.127..2567.024 rows=104 loops=1)
  Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
  Rows Removed by Filter: 1835
Planning Time: 0.060 ms
Execution Time: 2567.044 ms
(5 rows)
============

After is 300ms:

============
Bitmap Heap Scan on paiyroll_payrun  (cost=36.11..64.67 rows=14
width=32) (actual time=4.189..311.932 rows=104 loops=1)
  Recheck Cond: ((snapshot -> 'employee'::text) ? '2209'::text)
  Rows Removed by Index Recheck: 1
  Heap Blocks: exact=8
  ->  Bitmap Index Scan on idx1  (cost=0.00..36.10 rows=14 width=0)
(actual time=0.087..0.087 rows=105 loops=1)
        Index Cond: ((snapshot -> 'employee'::text) ? '2209'::text)
Planning Time: 0.167 ms
Execution Time: 311.962 ms
(8 rows)
============

Woot!

Unfortunately, the real query which I think should behave very
similarly is still at the several-seconds level despite using the
index. Before 3600ms:

============
SELECT "paiyroll_payrun"."actual_t"
  FROM "paiyroll_payrun"
 WHERE ("paiyroll_payrun"."company_id" = 173 AND
("paiyroll_payrun"."snapshot" -> 'employee') ? '16376'
   AND NOT (("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','last_run_of_employment']) = 'true'
   AND ("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','pay_graph']) = '0'
   AND ("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','state','employment','-1','2']) > '0'))
 ORDER BY "paiyroll_payrun"."actual_t" DESC
 LIMIT 1
============

============
Limit (cost=31.33..31.33 rows=1 width=4) (actual
time=3595.174..3595.176 rows=1 loops=1)
  -> Sort (cost=31.33..31.33 rows=3 width=4) (actual
time=3595.174..3595.174 rows=1 loops=1)
    Sort Key: actual_t DESC
    Sort Method: top-N heapsort Memory: 25kB
    -> Bitmap Heap Scan on paiyroll_payrun (cost=6.43..31.31 rows=3
width=4) (actual time=44.575..3595.082 rows=62 loops=1)
       Recheck Cond: (company_id = 173)
       Filter: (((snapshot -> 'employee'::text) ? '16376'::text) AND
(((snapshot #> '{employee,16376,last_run_of_employment}'::text[]) <>
'true'::jsonb) OR ((snapshot #> '{employee,16376,pay_graph}'::text[])
<> '0'::jsonb) OR ((snapshot #>
'{employee,16376,state,employment,-1,2}'::text[]) <= '0'::jsonb)))
Rows Removed by Filter: 242
       Heap Blocks: exact=9
         -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
(cost=0.00..6.43 rows=304 width=0) (actual time=0.013..0.013 rows=304
loops=1)
         Index Cond: (company_id = 173)
Planning Time: 0.258 ms
Execution Time: 3595.195 ms
============

After, with the index in use, 2200ms:

================
Limit (cost=30.92..30.93 rows=1 width=4) (actual
time=2258.989..2258.990 rows=1 loops=1)
  -> Sort (cost=30.92..30.93 rows=1 width=4) (actual
time=2258.988..2258.989 rows=1 loops=1)
    Sort Key: actual_t DESC
    Sort Method: top-N heapsort Memory: 25kB
    -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
width=4) (actual time=32.488..2258.891 rows=62 loops=1)
      Recheck Cond: ((company_id = 173) AND ((snapshot ->
'employee'::text) ? '16376'::text))
      Filter: (((snapshot #>
'{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
'0'::jsonb))
      Heap Blocks: exact=5
        -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
time=0.038..0.039 rows=0 loops=1)
          -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
(cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
loops=1)
                Index Cond: (company_id = 173)
          -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
width=0) (actual time=0.021..0.021 rows=62 loops=1)
                Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text)
Planning Time: 0.245 ms
Execution Time: 2259.019 ms
===================

IIUC, at the bottom, the indices are doing their thing, but a couple
of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
cannot quite see why. Have I missed a needed index or what? I'm pretty
much a novice at SQL, so it is quite possible I've missed something
very basic.

Thanks, Shaheed

On Sat, 28 May 2022 at 20:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Shaheed Haque <shaheedhaque@gmail.com> writes:
> > One last thought about TOAST. If the cost of the -> retrieving the
> > data cannot be obviated, is there any way to tweak how that works?
>
> The only thing that's readily tweakable is to disable data compression
> for the out-of-line values (see ALTER TABLE ... SET STORAGE, and note
> that you have to rebuild the table for already-stored values to be
> changed).  It seems unlikely that that will help you much though,
> since doing that would save CPU at the cost of more disk I/O, and
> it seems that the I/O side is your bottleneck already.  It *would*
> help if jsonb had logic for partial fetches, because that'd require
> uncompressed storage to work.  But AFAICS that's not there yet.
> I distinctly recall that that was planned for when we were designing
> jsonb's on-disk layout, but I see no code actually doing anything
> of that sort ... the -> operator certainly doesn't.
>
>                         regards, tom lane



Re: JSONB index not in use, but is TOAST the real cause of slow query?

От
Tom Lane
Дата:
Shaheed Haque <shaheedhaque@gmail.com> writes:
> Unfortunately, the real query which I think should behave very
> similarly is still at the several-seconds level despite using the
> index. ...

>     -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
> width=4) (actual time=32.488..2258.891 rows=62 loops=1)
>       Recheck Cond: ((company_id = 173) AND ((snapshot ->
> 'employee'::text) ? '16376'::text))
>       Filter: (((snapshot #>
> '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
> OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
> OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
> '0'::jsonb))
>       Heap Blocks: exact=5
>         -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
> time=0.038..0.039 rows=0 loops=1)
>           -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
> (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
> loops=1)
>                 Index Cond: (company_id = 173)
>           -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
> width=0) (actual time=0.021..0.021 rows=62 loops=1)
>                 Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text)

> IIUC, at the bottom, the indices are doing their thing, but a couple
> of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
> cannot quite see why.

I suppose it's the execution of that "Filter" condition, which will
require perhaps as many as three fetches of the "snapshot" column.

You really need to rethink that data structure.  Sure, you can store tons
of unorganized data in a jsonb column, but you pay for that convenience
with slow access.  Normalizing the bits you need frequently into a more
traditional relational schema is the route to better-performing queries.

            regards, tom lane



Re: JSONB index not in use, but is TOAST the real cause of slow query?

От
Shaheed Haque
Дата:


On Sun, 29 May 2022, 15:58 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Shaheed Haque <shaheedhaque@gmail.com> writes:
> Unfortunately, the real query which I think should behave very
> similarly is still at the several-seconds level despite using the
> index. ...

>     -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
> width=4) (actual time=32.488..2258.891 rows=62 loops=1)
>       Recheck Cond: ((company_id = 173) AND ((snapshot ->
> 'employee'::text) ? '16376'::text))
>       Filter: (((snapshot #>
> '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
> OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
> OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
> '0'::jsonb))
>       Heap Blocks: exact=5
>         -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
> time=0.038..0.039 rows=0 loops=1)
>           -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
> (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
> loops=1)
>                 Index Cond: (company_id = 173)
>           -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
> width=0) (actual time=0.021..0.021 rows=62 loops=1)
>                 Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text)

> IIUC, at the bottom, the indices are doing their thing, but a couple
> of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
> cannot quite see why.

I suppose it's the execution of that "Filter" condition, which will
require perhaps as many as three fetches of the "snapshot" column.

Thanks, that's clearly in the frame. 

You really need to rethink that data structure.  Sure, you can store tons
of unorganized data in a jsonb column, but you pay for that convenience
with slow access.  Normalizing the bits you need frequently into a more
traditional relational schema is the route to better-performing queries.

Ack. Indeed, the current design works very well for all of the access patterns other than this one, which only recently came into view as a problem. 

Ahead of contemplating a design change I have been looking at how to optimise this bit. I'm currently mired in a crash course on SQL syntax as pertains to JSONB, jsonpath et. al. And the equally mysterious side effects of "?" and "@>" and so on in terms of the amount of data being fetched etc. (and all wrapped in a dose of ORM for good measure). 

I'll write separately with more specific questions if needed on those details. 

Thanks again for the kind help. 

Shaheed


                        regards, tom lane

Is it possible to index "deep" into a JSONB column?

От
Shaheed Haque
Дата:
Suppose I have a JSONB field called "snapshot". I can create a GIN
index on it like this:

  create index idx1 on mytable using gin (snapshot);

In principle, I believe this allows index-assisted access to keys and
values nested in arrays and inner objects but in practice, it seems
the planner "often" decides to ignore the index in favour of a table
scan. (As discussed elsewhere, this is influenced by the number of
rows, and possibly other criteria too).

Now, I know it is possible to index inner objects, so that is snapshot
looks like this:

{
    "stuff": {},
    "more other stuff": {},
    "employee": {
         "1234": {"date_of_birth": "1970-01-01"},
         "56B789": {"date_of_birth": "1971-02-02"},
    }
}

I can say:

  create index idx2 on mytable using gin ((snapshot -> 'employee'));

But what is the syntax to index only on date_of_birth? I assume a
btree would work since it is a primitive value, but WHAT GOES HERE in
this:

  create index idx3 on mytable using btree ((snapshot ->'employee' ->
WHAT GOES HERE -> 'date_of_birth'));

I believe an asterisk "*" would work if 'employee' was an array, but
here it is  nested object with keys. If it helps, the keys are
invariably numbers (in quoted string form, as per JSON).

Thanks, Shaheed



Re: Is it possible to index "deep" into a JSONB column?

От
Bryn Llewellyn
Дата:
shaheedhaque@gmail.com wrote:

Suppose I have a JSONB field called "snapshot". I can create a GIN
index on it like this:

 create index idx1 on mytable using gin (snapshot);

In principle, I believe this allows index-assisted access to keys and
values nested in arrays and inner objects but in practice, it seems
the planner "often" decides to ignore the index in favour of a table
scan. (As discussed elsewhere, this is influenced by the number of
rows, and possibly other criteria too).

Now, I know it is possible to index inner objects, so that is snapshot
looks like this:

{
   "stuff": {},
   "more other stuff": {},
   "employee": {
        "1234": {"date_of_birth": "1970-01-01"},
        "56B789": {"date_of_birth": "1971-02-02"},
   }
}

I can say:

 create index idx2 on mytable using gin ((snapshot -> 'employee'));

But what is the syntax to index only on date_of_birth? I assume a
btree would work since it is a primitive value, but WHAT GOES HERE in
this:

 create index idx3 on mytable using btree ((snapshot ->'employee' ->
WHAT GOES HERE -> 'date_of_birth'));

I believe an asterisk "*" would work if 'employee' was an array, but
here it is  nested object with keys. If it helps, the keys are
invariably numbers (in quoted string form, as per JSON).

Try this:

   snapshot -> ‘employee’->>’date_of_birth’

Re: Is it possible to index "deep" into a JSONB column?

От
Shaheed Haque
Дата:
Hi Bryn,

On Mon, 30 May 2022 at 03:12, Bryn Llewellyn <bryn@yugabyte.com> wrote:
...
>
> Try this:
>
>    snapshot -> ‘employee’->>’date_of_birth’
>

Syntactically, that works:

    create index bryn on paiyroll_payrun using btree ((snapshot ->
'employee'->>'date_of_birth'));

But IIUC it is looking for 'date_of_birth' in the wrong level. it
would need to do something like this:

    create index bryn on paiyroll_payrun using btree ((snapshot ->
'employee'->'2209'->>'date_of_birth'));

To paraphrase, my question is about how to replace the '2209' with all
possible object keys. For the employee-is-an-array, the documentation
says this should work:

    create index bryn2 on paiyroll_payrun using btree ((snapshot ->
'employee[*]'->>'date_of_birth'));

but is silent on the employee-is-an-object case. (As I said, in case
it helps, in my case the keys are in fact stringified numbers).

Thanks, Shaheed



Re: Is it possible to index "deep" into a JSONB column?

От
Imre Samu
Дата:
Hi Shaheed,

> WHAT GOES HERE 

imho check the:   jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) 

may example:

CREATE TABLE jsonb_table (
    id serial primary key,
    jsonb_col JSONB
);

INSERT INTO jsonb_table(jsonb_col)
VALUES
  ('{"stuff": {},"employee": {"8011":  {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'),
  ('{"stuff": {},"employee": {"7011":  {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'),
  ('{"stuff": {},"employee": {"a12":   {"date_of_birth": "2000-01-01"},"b56":  {"date_of_birth": "2000-02-02"}}}')
;

select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )  from jsonb_table;

-- create index
create index jpqarr_idx
  on jsonb_table
  using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
-- tests:
select id from jsonb_table
  where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
;
DROP TABLE
CREATE TABLE
INSERT 0 3
+------------------------------+
|    jsonb_path_query_array    |
+------------------------------+
| ["1980-01-01", "1982-02-02"] |
| ["1970-01-01", "1971-02-02"] |
| ["2000-01-01", "2000-02-02"] |
+------------------------------+
(3 rows)

CREATE INDEX
+----+
| id |
+----+
|  3 |
+----+
(1 row)


Regards,
 Imre

Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
Suppose I have a JSONB field called "snapshot". I can create a GIN
index on it like this:

  create index idx1 on mytable using gin (snapshot);

In principle, I believe this allows index-assisted access to keys and
values nested in arrays and inner objects but in practice, it seems
the planner "often" decides to ignore the index in favour of a table
scan. (As discussed elsewhere, this is influenced by the number of
rows, and possibly other criteria too).

Now, I know it is possible to index inner objects, so that is snapshot
looks like this:

{
    "stuff": {},
    "more other stuff": {},
    "employee": {
         "1234": {"date_of_birth": "1970-01-01"},
         "56B789": {"date_of_birth": "1971-02-02"},
    }
}

I can say:

  create index idx2 on mytable using gin ((snapshot -> 'employee'));

But what is the syntax to index only on date_of_birth? I assume a
btree would work since it is a primitive value, but WHAT GOES HERE in
this:

  create index idx3 on mytable using btree ((snapshot ->'employee' ->
WHAT GOES HERE -> 'date_of_birth'));

I believe an asterisk "*" would work if 'employee' was an array, but
here it is  nested object with keys. If it helps, the keys are
invariably numbers (in quoted string form, as per JSON).

Thanks, Shaheed


Re: Is it possible to index "deep" into a JSONB column?

От
Imre Samu
Дата:
> In principle, I believe this allows index-assisted access to keys and
> values nested in arrays and inner objects but in practice, it seems
> the planner "often" decides to ignore the index in favour of a table scan. 

part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"

SET enable_seqscan = OFF;
select * from jsonb_table
  where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
;
+----+---------------------------------------------------------------------------------------------------------------+
| id |                                                   jsonb_col                                                   |
+----+---------------------------------------------------------------------------------------------------------------+
|  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}} |
+----+---------------------------------------------------------------------------------------------------------------+
(1 row)


EXPLAIN ANALYZE select * from jsonb_table
  where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
;
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                       QUERY PLAN                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Bitmap Heap Scan on jsonb_table  (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1)                                            |
|   Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> to_jsonb('1971-02-02'::text))     |
|   Heap Blocks: exact=1                                                                                                                                  |
|   ->  Bitmap Index Scan on jpqarr_idx  (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1)                                       |
|         Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> to_jsonb('1971-02-02'::text)) |
| Planning Time: 0.255 ms                                                                                                                                 |
| Execution Time: 0.122 ms                                                                                                                                |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
(7 rows)


regards,
  Imre


Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30):
Hi Shaheed,

> WHAT GOES HERE 

imho check the:   jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) 

may example:

CREATE TABLE jsonb_table (
    id serial primary key,
    jsonb_col JSONB
);

INSERT INTO jsonb_table(jsonb_col)
VALUES
  ('{"stuff": {},"employee": {"8011":  {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'),
  ('{"stuff": {},"employee": {"7011":  {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'),
  ('{"stuff": {},"employee": {"a12":   {"date_of_birth": "2000-01-01"},"b56":  {"date_of_birth": "2000-02-02"}}}')
;

select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )  from jsonb_table;

-- create index
create index jpqarr_idx
  on jsonb_table
  using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
-- tests:
select id from jsonb_table
  where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
;
DROP TABLE
CREATE TABLE
INSERT 0 3
+------------------------------+
|    jsonb_path_query_array    |
+------------------------------+
| ["1980-01-01", "1982-02-02"] |
| ["1970-01-01", "1971-02-02"] |
| ["2000-01-01", "2000-02-02"] |
+------------------------------+
(3 rows)

CREATE INDEX
+----+
| id |
+----+
|  3 |
+----+
(1 row)


Regards,
 Imre

Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
Suppose I have a JSONB field called "snapshot". I can create a GIN
index on it like this:

  create index idx1 on mytable using gin (snapshot);

In principle, I believe this allows index-assisted access to keys and
values nested in arrays and inner objects but in practice, it seems
the planner "often" decides to ignore the index in favour of a table
scan. (As discussed elsewhere, this is influenced by the number of
rows, and possibly other criteria too).

Now, I know it is possible to index inner objects, so that is snapshot
looks like this:

{
    "stuff": {},
    "more other stuff": {},
    "employee": {
         "1234": {"date_of_birth": "1970-01-01"},
         "56B789": {"date_of_birth": "1971-02-02"},
    }
}

I can say:

  create index idx2 on mytable using gin ((snapshot -> 'employee'));

But what is the syntax to index only on date_of_birth? I assume a
btree would work since it is a primitive value, but WHAT GOES HERE in
this:

  create index idx3 on mytable using btree ((snapshot ->'employee' ->
WHAT GOES HERE -> 'date_of_birth'));

I believe an asterisk "*" would work if 'employee' was an array, but
here it is  nested object with keys. If it helps, the keys are
invariably numbers (in quoted string form, as per JSON).

Thanks, Shaheed


Re: Is it possible to index "deep" into a JSONB column?

От
Shaheed Haque
Дата:
Imre,

I'm gradually working my way into the combination of SQL, JSONB and
jsonpath that this involves even without the indexing, but this looks
very helpful/promising, especially on the indexing. Thanks a lot for
the input,

Shaheed

On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote:
>
> > In principle, I believe this allows index-assisted access to keys and
> > values nested in arrays and inner objects but in practice, it seems
> > the planner "often" decides to ignore the index in favour of a table scan.
>
> part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"
>
> SET enable_seqscan = OFF;
> select * from jsonb_table
>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> ;
>
+----+---------------------------------------------------------------------------------------------------------------+
> | id |                                                   jsonb_col
|
>
+----+---------------------------------------------------------------------------------------------------------------+
> |  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}}
|
>
+----+---------------------------------------------------------------------------------------------------------------+
> (1 row)
>
>
> EXPLAIN ANALYZE select * from jsonb_table
>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> ;
>
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> |                                                                       QUERY PLAN
                                   | 
>
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Bitmap Heap Scan on jsonb_table  (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1)
                                   | 
> |   Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false)
@>to_jsonb('1971-02-02'::text))     | 
> |   Heap Blocks: exact=1
                                   | 
> |   ->  Bitmap Index Scan on jpqarr_idx  (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1)
                                   | 
> |         Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text)) | 
> | Planning Time: 0.255 ms
                                   | 
> | Execution Time: 0.122 ms
                                   | 
>
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> (7 rows)
>
> regards,
>   Imre
>
>
> Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30):
>>
>> Hi Shaheed,
>>
>> > WHAT GOES HERE
>>
>> imho check the:   jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )
>>
>> may example:
>>
>> CREATE TABLE jsonb_table (
>>     id serial primary key,
>>     jsonb_col JSONB
>> );
>>
>> INSERT INTO jsonb_table(jsonb_col)
>> VALUES
>>   ('{"stuff": {},"employee": {"8011":  {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'),
>>   ('{"stuff": {},"employee": {"7011":  {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'),
>>   ('{"stuff": {},"employee": {"a12":   {"date_of_birth": "2000-01-01"},"b56":  {"date_of_birth": "2000-02-02"}}}')
>> ;
>>
>> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )  from jsonb_table;
>>
>> -- create index
>> create index jpqarr_idx
>>   on jsonb_table
>>   using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
>> -- tests:
>> select id from jsonb_table
>>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
>> ;
>> DROP TABLE
>> CREATE TABLE
>> INSERT 0 3
>> +------------------------------+
>> |    jsonb_path_query_array    |
>> +------------------------------+
>> | ["1980-01-01", "1982-02-02"] |
>> | ["1970-01-01", "1971-02-02"] |
>> | ["2000-01-01", "2000-02-02"] |
>> +------------------------------+
>> (3 rows)
>>
>> CREATE INDEX
>> +----+
>> | id |
>> +----+
>> |  3 |
>> +----+
>> (1 row)
>>
>>
>> Regards,
>>  Imre
>>
>> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
>>>
>>> Suppose I have a JSONB field called "snapshot". I can create a GIN
>>> index on it like this:
>>>
>>>   create index idx1 on mytable using gin (snapshot);
>>>
>>> In principle, I believe this allows index-assisted access to keys and
>>> values nested in arrays and inner objects but in practice, it seems
>>> the planner "often" decides to ignore the index in favour of a table
>>> scan. (As discussed elsewhere, this is influenced by the number of
>>> rows, and possibly other criteria too).
>>>
>>> Now, I know it is possible to index inner objects, so that is snapshot
>>> looks like this:
>>>
>>> {
>>>     "stuff": {},
>>>     "more other stuff": {},
>>>     "employee": {
>>>          "1234": {"date_of_birth": "1970-01-01"},
>>>          "56B789": {"date_of_birth": "1971-02-02"},
>>>     }
>>> }
>>>
>>> I can say:
>>>
>>>   create index idx2 on mytable using gin ((snapshot -> 'employee'));
>>>
>>> But what is the syntax to index only on date_of_birth? I assume a
>>> btree would work since it is a primitive value, but WHAT GOES HERE in
>>> this:
>>>
>>>   create index idx3 on mytable using btree ((snapshot ->'employee' ->
>>> WHAT GOES HERE -> 'date_of_birth'));
>>>
>>> I believe an asterisk "*" would work if 'employee' was an array, but
>>> here it is  nested object with keys. If it helps, the keys are
>>> invariably numbers (in quoted string form, as per JSON).
>>>
>>> Thanks, Shaheed
>>>
>>>



Re: JSONB index not in use, but is TOAST the real cause of slow query?

От
Jeff Janes
Дата:
On Sat, May 28, 2022 at 1:54 PM Shaheed Haque <shaheedhaque@gmail.com> wrote:
 

And how can I understand the dreadful amount of
time (of course, this is just on my dev machine, but still...)? Is
there a way to see/tweak what TOAST costs or indeed to confirm if it
is even in use?

Turn on track_io_timing, and then do explain (analyse, buffers).  This won't distinguish TOAST from main table IO, but will at least confirm if a large amount of IO is happening and how long it is taking.  

You can look at pg_statio_user_tables before the after the query to distinguish TOAST, but this won't give you the timing, just the block counts.  And the counts might also get incremented by concurrent users, so on a busy system it is hard to interpret.

Are these queries fast upon repeat execution?  A few meg time less than 1000 it just not that much with modern hardware, you should be able to just cache it in RAM.

Cheers,

Jeff

Re: Is it possible to index "deep" into a JSONB column?

От
Shaheed Haque
Дата:
OK, I was able to translate your excellent note into this:

    CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot ->
'$.employee.*.works_id'));

and query using "@>" to see it in use:

   SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')...

   EXPLAIN ANALYSE...
        ->  Bitmap Index Scan on foo  (cost=0.00..8.22 rows=29
width=0) (actual time=0.005..0.005 rows=0 loops=1)
              Index Cond: ((snapshot -> '$.employee.*.works_id'::text)
@> '1091'::jsonb)

Unfortunately, my real query involved multiple AND clauses which (as
per another recent thread) seems to end up fetching the large
(probably TOASTed) JSONB once for each AND clause at a cost of 150ms
each. So, I got rid of the multiple ANDs by staying inside the
jsonpath like this:

    SELECT ... WHERE (... AND
        (snapshot @? '$.employee."2209" ? (
            @.pay_graph <> 0 || @.last_run_of_employment == true ||
@.state.employment[last][2] == 0)'
        )
    );

But I have not been able to find an index formulation the new jsonpath
can use. I tried adding

    CREATE INDEX ... USING gin ((snapshot -> '$.employee.*'));
    CREATE INDEX ... USING gin ((snapshot -> '$.employee'));
    CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot ->
'$.employee.*.pay_graph'));

Any thoughts on indexing for this case? If it makes any difference,
I'm presently on PG12 and PG13 but looking to move to PG14.

Thanks, Shaheed

On Mon, 30 May 2022 at 19:59, Shaheed Haque <shaheedhaque@gmail.com> wrote:
>
> Imre,
>
> I'm gradually working my way into the combination of SQL, JSONB and
> jsonpath that this involves even without the indexing, but this looks
> very helpful/promising, especially on the indexing. Thanks a lot for
> the input,
>
> Shaheed
>
> On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote:
> >
> > > In principle, I believe this allows index-assisted access to keys and
> > > values nested in arrays and inner objects but in practice, it seems
> > > the planner "often" decides to ignore the index in favour of a table scan.
> >
> > part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"
> >
> > SET enable_seqscan = OFF;
> > select * from jsonb_table
> >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > ;
> >
+----+---------------------------------------------------------------------------------------------------------------+
> > | id |                                                   jsonb_col
| 
> >
+----+---------------------------------------------------------------------------------------------------------------+
> > |  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth":
"1971-02-02"}}}| 
> >
+----+---------------------------------------------------------------------------------------------------------------+
> > (1 row)
> >
> >
> > EXPLAIN ANALYZE select * from jsonb_table
> >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > ;
> >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > |                                                                       QUERY PLAN
                                     | 
> >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > | Bitmap Heap Scan on jsonb_table  (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1)
                                     | 
> > |   Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text))     | 
> > |   Heap Blocks: exact=1
                                     | 
> > |   ->  Bitmap Index Scan on jpqarr_idx  (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1)
                                     | 
> > |         Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text)) | 
> > | Planning Time: 0.255 ms
                                     | 
> > | Execution Time: 0.122 ms
                                     | 
> >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > (7 rows)
> >
> > regards,
> >   Imre
> >
> >
> > Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30):
> >>
> >> Hi Shaheed,
> >>
> >> > WHAT GOES HERE
> >>
> >> imho check the:   jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )
> >>
> >> may example:
> >>
> >> CREATE TABLE jsonb_table (
> >>     id serial primary key,
> >>     jsonb_col JSONB
> >> );
> >>
> >> INSERT INTO jsonb_table(jsonb_col)
> >> VALUES
> >>   ('{"stuff": {},"employee": {"8011":  {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth":
"1982-02-02"}}}'),
> >>   ('{"stuff": {},"employee": {"7011":  {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth":
"1971-02-02"}}}'),
> >>   ('{"stuff": {},"employee": {"a12":   {"date_of_birth": "2000-01-01"},"b56":  {"date_of_birth": "2000-02-02"}}}')
> >> ;
> >>
> >> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )  from jsonb_table;
> >>
> >> -- create index
> >> create index jpqarr_idx
> >>   on jsonb_table
> >>   using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
> >> -- tests:
> >> select id from jsonb_table
> >>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
> >> ;
> >> DROP TABLE
> >> CREATE TABLE
> >> INSERT 0 3
> >> +------------------------------+
> >> |    jsonb_path_query_array    |
> >> +------------------------------+
> >> | ["1980-01-01", "1982-02-02"] |
> >> | ["1970-01-01", "1971-02-02"] |
> >> | ["2000-01-01", "2000-02-02"] |
> >> +------------------------------+
> >> (3 rows)
> >>
> >> CREATE INDEX
> >> +----+
> >> | id |
> >> +----+
> >> |  3 |
> >> +----+
> >> (1 row)
> >>
> >>
> >> Regards,
> >>  Imre
> >>
> >> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
> >>>
> >>> Suppose I have a JSONB field called "snapshot". I can create a GIN
> >>> index on it like this:
> >>>
> >>>   create index idx1 on mytable using gin (snapshot);
> >>>
> >>> In principle, I believe this allows index-assisted access to keys and
> >>> values nested in arrays and inner objects but in practice, it seems
> >>> the planner "often" decides to ignore the index in favour of a table
> >>> scan. (As discussed elsewhere, this is influenced by the number of
> >>> rows, and possibly other criteria too).
> >>>
> >>> Now, I know it is possible to index inner objects, so that is snapshot
> >>> looks like this:
> >>>
> >>> {
> >>>     "stuff": {},
> >>>     "more other stuff": {},
> >>>     "employee": {
> >>>          "1234": {"date_of_birth": "1970-01-01"},
> >>>          "56B789": {"date_of_birth": "1971-02-02"},
> >>>     }
> >>> }
> >>>
> >>> I can say:
> >>>
> >>>   create index idx2 on mytable using gin ((snapshot -> 'employee'));
> >>>
> >>> But what is the syntax to index only on date_of_birth? I assume a
> >>> btree would work since it is a primitive value, but WHAT GOES HERE in
> >>> this:
> >>>
> >>>   create index idx3 on mytable using btree ((snapshot ->'employee' ->
> >>> WHAT GOES HERE -> 'date_of_birth'));
> >>>
> >>> I believe an asterisk "*" would work if 'employee' was an array, but
> >>> here it is  nested object with keys. If it helps, the keys are
> >>> invariably numbers (in quoted string form, as per JSON).
> >>>
> >>> Thanks, Shaheed
> >>>
> >>>



Re: Is it possible to index "deep" into a JSONB column?

От
Shaheed Haque
Дата:
I forgot to say...

I see the documentation on jsonpath indexing says:

"GIN index extracts statements of following form out of jsonpath:
accessors_chain = const. Accessors chain may consist of .key, [*], and
[index] accessors. jsonb_ops additionally supports .* and .**
accessors"

But I'm unable to make "CREATE INDEX...USING gin ((snapshot ->
'$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true
|| @.state.employment[last][2] == 0)'))" trigger that.

On Tue, 31 May 2022 at 10:16, Shaheed Haque <shaheedhaque@gmail.com> wrote:
>
> OK, I was able to translate your excellent note into this:
>
>     CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot ->
> '$.employee.*.works_id'));
>
> and query using "@>" to see it in use:
>
>    SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')...
>
>    EXPLAIN ANALYSE...
>         ->  Bitmap Index Scan on foo  (cost=0.00..8.22 rows=29
> width=0) (actual time=0.005..0.005 rows=0 loops=1)
>               Index Cond: ((snapshot -> '$.employee.*.works_id'::text)
> @> '1091'::jsonb)
>
> Unfortunately, my real query involved multiple AND clauses which (as
> per another recent thread) seems to end up fetching the large
> (probably TOASTed) JSONB once for each AND clause at a cost of 150ms
> each. So, I got rid of the multiple ANDs by staying inside the
> jsonpath like this:
>
>     SELECT ... WHERE (... AND
>         (snapshot @? '$.employee."2209" ? (
>             @.pay_graph <> 0 || @.last_run_of_employment == true ||
> @.state.employment[last][2] == 0)'
>         )
>     );
>
> But I have not been able to find an index formulation the new jsonpath
> can use. I tried adding
>
>     CREATE INDEX ... USING gin ((snapshot -> '$.employee.*'));
>     CREATE INDEX ... USING gin ((snapshot -> '$.employee'));
>     CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot ->
> '$.employee.*.pay_graph'));
>
> Any thoughts on indexing for this case? If it makes any difference,
> I'm presently on PG12 and PG13 but looking to move to PG14.
>
> Thanks, Shaheed
>
> On Mon, 30 May 2022 at 19:59, Shaheed Haque <shaheedhaque@gmail.com> wrote:
> >
> > Imre,
> >
> > I'm gradually working my way into the combination of SQL, JSONB and
> > jsonpath that this involves even without the indexing, but this looks
> > very helpful/promising, especially on the indexing. Thanks a lot for
> > the input,
> >
> > Shaheed
> >
> > On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote:
> > >
> > > > In principle, I believe this allows index-assisted access to keys and
> > > > values nested in arrays and inner objects but in practice, it seems
> > > > the planner "often" decides to ignore the index in favour of a table scan.
> > >
> > > part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"
> > >
> > > SET enable_seqscan = OFF;
> > > select * from jsonb_table
> > >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > > ;
> > >
+----+---------------------------------------------------------------------------------------------------------------+
> > > | id |                                                   jsonb_col
  | 
> > >
+----+---------------------------------------------------------------------------------------------------------------+
> > > |  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth":
"1971-02-02"}}}| 
> > >
+----+---------------------------------------------------------------------------------------------------------------+
> > > (1 row)
> > >
> > >
> > > EXPLAIN ANALYZE select * from jsonb_table
> > >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > > ;
> > >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > > |                                                                       QUERY PLAN
                                       | 
> > >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > > | Bitmap Heap Scan on jsonb_table  (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1)
                                       | 
> > > |   Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text))     | 
> > > |   Heap Blocks: exact=1
                                       | 
> > > |   ->  Bitmap Index Scan on jpqarr_idx  (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1
loops=1)                                      | 
> > > |         Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text)) | 
> > > | Planning Time: 0.255 ms
                                       | 
> > > | Execution Time: 0.122 ms
                                       | 
> > >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > > (7 rows)
> > >
> > > regards,
> > >   Imre
> > >
> > >
> > > Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30):
> > >>
> > >> Hi Shaheed,
> > >>
> > >> > WHAT GOES HERE
> > >>
> > >> imho check the:   jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )
> > >>
> > >> may example:
> > >>
> > >> CREATE TABLE jsonb_table (
> > >>     id serial primary key,
> > >>     jsonb_col JSONB
> > >> );
> > >>
> > >> INSERT INTO jsonb_table(jsonb_col)
> > >> VALUES
> > >>   ('{"stuff": {},"employee": {"8011":  {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth":
"1982-02-02"}}}'),
> > >>   ('{"stuff": {},"employee": {"7011":  {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth":
"1971-02-02"}}}'),
> > >>   ('{"stuff": {},"employee": {"a12":   {"date_of_birth": "2000-01-01"},"b56":  {"date_of_birth":
"2000-02-02"}}}')
> > >> ;
> > >>
> > >> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )  from jsonb_table;
> > >>
> > >> -- create index
> > >> create index jpqarr_idx
> > >>   on jsonb_table
> > >>   using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
> > >> -- tests:
> > >> select id from jsonb_table
> > >>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
> > >> ;
> > >> DROP TABLE
> > >> CREATE TABLE
> > >> INSERT 0 3
> > >> +------------------------------+
> > >> |    jsonb_path_query_array    |
> > >> +------------------------------+
> > >> | ["1980-01-01", "1982-02-02"] |
> > >> | ["1970-01-01", "1971-02-02"] |
> > >> | ["2000-01-01", "2000-02-02"] |
> > >> +------------------------------+
> > >> (3 rows)
> > >>
> > >> CREATE INDEX
> > >> +----+
> > >> | id |
> > >> +----+
> > >> |  3 |
> > >> +----+
> > >> (1 row)
> > >>
> > >>
> > >> Regards,
> > >>  Imre
> > >>
> > >> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
> > >>>
> > >>> Suppose I have a JSONB field called "snapshot". I can create a GIN
> > >>> index on it like this:
> > >>>
> > >>>   create index idx1 on mytable using gin (snapshot);
> > >>>
> > >>> In principle, I believe this allows index-assisted access to keys and
> > >>> values nested in arrays and inner objects but in practice, it seems
> > >>> the planner "often" decides to ignore the index in favour of a table
> > >>> scan. (As discussed elsewhere, this is influenced by the number of
> > >>> rows, and possibly other criteria too).
> > >>>
> > >>> Now, I know it is possible to index inner objects, so that is snapshot
> > >>> looks like this:
> > >>>
> > >>> {
> > >>>     "stuff": {},
> > >>>     "more other stuff": {},
> > >>>     "employee": {
> > >>>          "1234": {"date_of_birth": "1970-01-01"},
> > >>>          "56B789": {"date_of_birth": "1971-02-02"},
> > >>>     }
> > >>> }
> > >>>
> > >>> I can say:
> > >>>
> > >>>   create index idx2 on mytable using gin ((snapshot -> 'employee'));
> > >>>
> > >>> But what is the syntax to index only on date_of_birth? I assume a
> > >>> btree would work since it is a primitive value, but WHAT GOES HERE in
> > >>> this:
> > >>>
> > >>>   create index idx3 on mytable using btree ((snapshot ->'employee' ->
> > >>> WHAT GOES HERE -> 'date_of_birth'));
> > >>>
> > >>> I believe an asterisk "*" would work if 'employee' was an array, but
> > >>> here it is  nested object with keys. If it helps, the keys are
> > >>> invariably numbers (in quoted string form, as per JSON).
> > >>>
> > >>> Thanks, Shaheed
> > >>>
> > >>>