Обсуждение: GIN JSONB path index is not always used

Поиск
Список
Период
Сортировка

GIN JSONB path index is not always used

От
Tomasz Szymański
Дата:
Hello! We have an issue with database planner choosing really expensive sequence scan instead of an index scan in some
cases.
I'm reaching out in order to maybe get some idea on what we're dealing with / what could be the actual issue here.
The table affected is the users table with a field called "private_metadata" which is JSONB type.
There are currently two GIN indexes on that table - one GIN and second which is GIN jsonb_path_ops.
The operator that is used for the problematic query is contains (@>) so jsonb_path_ops should be preferred according to
docshttps://www.postgresql.org/docs/11/gin-builtin-opclasses.html. 
However the database sometimes decides NOT to use the index and perform the sequence scan.
I have observed it happens in intervals - ex. 10 hours everything is ok, then something snaps and there are sequence
scansfor few more hours, then back to index and so on. 
More context:
- Database version: 11.18
- The table schema with indexes
CREATE TABLE account_user (
    id integer NOT NULL,
    private_metadata jsonb,
);
CREATE INDEX user_p_meta_idx ON account_user USING gin (private_metadata);
CREATE INDEX user_p_meta_jsonb_path_idx ON account_user USING gin (private_metadata jsonb_path_ops);
- The query that is perfomed
SELECT "account_user"."id", "account_user"."private_metadata"
FROM "account_user"
WHERE "account_user"."private_metadata" @ > '{"somekey": "somevalue"}'
LIMIT 21;
- Plan when it uses an index
{
  "Plan": {
    "Node Type": "Limit",
    "Parallel Aware": false,
    "Startup Cost": 1091.29,
    "Total Cost": 1165.26,
    "Plan Rows": 21,
    "Plan Width": 4,
    "Plans": [
      {
        "Node Type": "Bitmap Heap Scan",
        "Parent Relationship": "Outer",
        "Parallel Aware": false,
        "Relation Name": "account_user",
        "Alias": "account_user",
        "Startup Cost": 1091.29,
        "Total Cost": 17129.12,
        "Plan Rows": 4553,
        "Plan Width": 4,
        "Recheck Cond": "( private_metadata @ > ? :: jsonb )",
        "Plans": [
          {
            "Node Type": "Bitmap Index Scan",
            "Parent Relationship": "Outer",
            "Parallel Aware": false,
            "Index Name": "user_p_meta_jsonb_path_idx",
            "Startup Cost": 0,
            "Total Cost": 1090.15,
            "Plan Rows": 4553,
            "Plan Width": 0,
            "Index Cond": "( private_metadata @ > ? :: jsonb )"
          }
        ]
      }
    ]
  }
}
- Plan when it doesn't use an index
{
  "Plan": {
    "Node Type": "Limit",
    "Parallel Aware": false,
    "Startup Cost": 0,
    "Total Cost": 1184.3,
    "Plan Rows": 21,
    "Plan Width": 4,
    "Plans": [
      {
        "Node Type": "Seq Scan",
        "Parent Relationship": "Outer",
        "Parallel Aware": false,
        "Relation Name": "account_user",
        "Alias": "account_user",
        "Startup Cost": 0,
        "Total Cost": 256768.27,
        "Plan Rows": 4553,
        "Plan Width": 4,
        "Filter": "( private_metadata @ > ? :: jsonb )"
      }
    ]
  }
}
- There are ~4.5M rows on the table
- We currently have a script that is heavily inserting rows 24/7, about 130k rows / day (1,5 row/s)
- It seems maybe the index can't keep up(?) because of this heavy insertion
SELECT * FROM pgstatginindex('user_p_meta_jsonb_path_idx');
 version | pending_pages | pending_tuples
---------+---------------+----------------
       2 |            98 |          28807
(1 row)
Might it be the case that is cloggs up and cannot use the index when reading?
- Last autovacuum for some reason happened 4 days ago
select * from pg_stat_user_tables where relname = 'account_user';
-[ RECORD 1
]-------+-----------------------------
relid               | 74937
schemaname          | public
relname             | account_user
seq_scan            | 66578
seq_tup_read        | 99542628744
idx_scan            | 9342647
idx_tup_fetch       | 105527685
n_tup_ins           | 518901
n_tup_upd           | 684607
n_tup_del           | 25
n_tup_hot_upd       | 82803
n_live_tup          | 4591412
n_dead_tup          | 370828
n_mod_since_analyze | 152968
last_vacuum         |
last_autovacuum     | 2023-10-13 07: 35: 29.11448+00
last_analyze        |
last_autoanalyze    | 2023-10-13 07: 44: 31.90437+00
vacuum_count        | 0
autovacuum_count    | 2
analyze_count       | 0
- The plans above comes from a production system, I failed to reproduce it locally, only two cases in which I managed
toget a seq scan were: 
a) when there are very few rows in the table
b) when I run the filter @> '{}' in which case I suppose postgres deduces "you want everything" so additional index
lookupin not necessary (checked and this filter is impossible to induce by our code). 

Any feedback is appreciated! What would be the possible reason the planner chooses seq scan in that case?


Re: GIN JSONB path index is not always used

От
Laurenz Albe
Дата:
On Tue, 2023-10-17 at 15:48 +0200, Tomasz Szymański wrote:
> Hello! We have an issue with database planner choosing really expensive sequence scan instead of an index scan in
somecases. 

To analyze that, we'd need the output from EXPLAIN (ANALYZE, BUFFERS) SELECT ...
Plain text format please, no JSON.

Yours,
Laurenz Albe



Re: GIN JSONB path index is not always used

От
Jeff Janes
Дата:
On Tue, Oct 17, 2023 at 10:09 AM Tomasz Szymański <lime129@gmail.com> wrote:

- Database version: 11.18

That is pretty old.  It is 3 bug-fix releases out of date even for its major version, and the major version itself is just about to reach EOL and is missing relevant improvements. 

- Plan when it uses an index
    "Total Cost": 1165.26,
- Plan when it doesn't use an index
    "Total Cost": 1184.3,

The JSON format for plans is pretty non-ideal for human inspection; especially so once you include ANALYZE and BUFFERS, which you should do.  Please use the plain text format instead.  But I can see that the plans are very similar in cost, so it wouldn't take much to shift between them. Should we assume that not using the index is much slower (otherwise, why would you be asking the question?)?

 
- It seems maybe the index can't keep up(?) because of this heavy insertion
SELECT * FROM pgstatginindex('user_p_meta_jsonb_path_idx');
 version | pending_pages | pending_tuples
---------+---------------+----------------
       2 |            98 |          28807
(1 row)
Might it be the case that is cloggs up and cannot use the index when reading?

Definitely possible.  The planner does take those numbers into account when planning.  The easiest thing would be to just turn off fastupdate for those indexes.  That might make the INSERTs somewhat slower (it is hard to predict how much and you haven't complained about the performance of the INSERTs anyway) but should make the SELECTs more predictable and generally faster.  I habitually turn fastupdate off and then turn it back on only if I have an identifiable cause to do so.
 
If you don't want to turn fastupdate off, you could instead change the table's autovac parameters to be more aggressive (particularly n_ins_since_vacuum, except that that doesn't exist until v13), or have a cron job call gin_clean_pending_list periodically.


- Last autovacuum for some reason happened 4 days ago

n_live_tup          | 4591412
n_dead_tup          | 370828

Based on those numbers and default parameters, there is no reason for it to be running any sooner.  That reflects only 8% turnover while the default factor is 20%.

Cheers,

Jeff

Re: GIN JSONB path index is not always used

От
Tomasz Szymański
Дата:
Sorry for missing analyze and buffers, we did only had these plans at the time, providing ones performed with such:

When it does us an index:

----------------------------------------------------------------------------------------------------------------------------------+
Limit  (cost=255.29..329.26 rows=21 width=0) (actual time=8.023..8.025 rows=1 loops=1)
          
  Buffers: shared hit=54 read=6
          
  I/O Timings: read=7.094
          
  ->  Bitmap Heap Scan on account_user  (cost=255.29..16293.12 rows=4553 width=0) (actual time=8.022..8.023 rows=1
loops=1)       
        Recheck Cond: (private_metadata @> '{"somekey": "somevalue"}'::jsonb)
        Heap Blocks: exact=2
          
        Buffers: shared hit=54 read=6
          
        I/O Timings: read=7.094
          
        ->  Bitmap Index Scan on user_p_meta_idx  (cost=0.00..254.15 rows=4553 width=0) (actual time=7.985..7.985
rows=2loops=1) | 
              Index Cond: (private_metadata @> '{"somekey": "somevalue"}'::jsonb)|
              Buffers: shared hit=52 read=6
          
              I/O Timings: read=7.094
          
Planning Time: 1.134 ms
          
Execution Time: 8.065 ms
          

----------------------------------------------------------------------------------------------------------------------------------+

When it does not:

----------------------------------------------------------------------------------------------------------------------------------+
 Limit  (cost=0.00..1184.30 rows=21 width=4) (actual time=1567.136..1619.956 rows=1 loops=1)
   Buffers: shared hit=199857
   ->  Seq Scan on account_user  (cost=0.00..256768.27 rows=4553 width=4) (actual time=1567.135..1619.953 rows=1
loops=1)
         Filter: (private_metadata @> '{"somekey": "somevalue"}'::jsonb)
         Rows Removed by Filter: 4592408
         Buffers: shared hit=199857
 Planning Time: 0.072 ms
 Execution Time: 1619.972 ms

----------------------------------------------------------------------------------------------------------------------------------+

> Should we assume that not using the index is much slower (otherwise, why would you be asking the question?)?
Yes, the issue is the sequence scan being expensive and slow.





Re: GIN JSONB path index is not always used

От
Jeff Janes
Дата:
On Mon, Oct 23, 2023 at 6:33 AM Tomasz Szymański <lime129@gmail.com> wrote:
 
 Limit  (cost=0.00..1184.30 rows=21 width=4) (actual time=1567.136..1619.956 rows=1 loops=1)
   ->  Seq Scan on account_user  (cost=0.00..256768.27 rows=4553 width=4) (actual time=1567.135..1619.953 rows=1 loops=1)
     
 
It thinks the seq scan will stop 99.5% early, after finding 21 out of 4553 qualifying tuples.  But instead it has to read the entire table to actually find only 1.

The selectivity estimate of the @> operator has been substantially improved in v13.  It is still far from perfect, but should be good enough to solve this problem for this case and most similar cases.  Turning off fastupdate on the index would probably also solve the problem, for a different reason.

Cheers,

Jeff