Обсуждение: BUG #16280: dead tuples (probably) effect plan and query performance
The following bug has been logged on the website: Bug reference: 16280 Logged by: Ilya Serbin Email address: iserbin@bostonsd.ru PostgreSQL version: 11.6 Operating system: Centos 7.4 Description: Hello all, Faced some strange plan changes with a query (query is quite bad, however I can't understand why the plan changes). Context: db1=> \d+ table1 Table "db1.table1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('table1_id_seq'::regclass) | plain | | table1_id | character varying(255) | | not null | | extended | | content | jsonb | | | | extended | | created_at | timestamp without time zone | | not null | now() | plain | | updated_at | timestamp without time zone | | not null | now() | plain | | queued_at | timestamp with time zone | | | | plain | | Indexes: "table1_pkey" PRIMARY KEY, btree (id) "uk_table1_id" UNIQUE CONSTRAINT, btree (table1_id) "content_idx" gin (content jsonb_path_ops) Referenced by: TABLE "collection_table1s" CONSTRAINT "fk_collection_table1s_table1_id" FOREIGN KEY (table1_id) REFERENCES table1(id) TABLE "db1_table1s" CONSTRAINT "fk_db1_table1s_table1_id" FOREIGN KEY (table1s_id) REFERENCES table1(id) TABLE "table1_sort" CONSTRAINT "fk_table1_sort_table1_id" FOREIGN KEY (table1_id) REFERENCES table1(table1_id) name | setting --------------------------------+----------- constraint_exclusion | partition cpu_index_tuple_cost | 0.005 cpu_operator_cost | 0.0025 cpu_tuple_cost | 0.01 default_statistics_target | 200 effective_cache_size | 1572864 enable_bitmapscan | on enable_hashagg | on enable_hashjoin | on enable_indexonlyscan | on enable_indexscan | on enable_material | on enable_mergejoin | on enable_nestloop | on enable_parallel_append | on enable_parallel_hash | on enable_partition_pruning | on enable_partitionwise_aggregate | off enable_partitionwise_join | off enable_seqscan | on enable_sort | on enable_tidscan | on random_page_cost | 1.1 seq_page_cost | 1 shared_buffers | 524288 (25 rows) First plan: Good plan and problematic query: https://explain.tensor.ru/archive/explain/1cf3c0181a9574bf2fd06d6fd07dc201:0:2020-02-27#context Second plan: Same query and it's bad plan: https://explain.tensor.ru/archive/explain/3770d39f786135e38bebeb1a8a4b1da9:0:2020-02-27#context Issue description: After some minimal activity in database first plan changes to the second. Analyze on table1 do not help (tried with various default_statistics_target values). content_idx index recreation helps for some time, but several minutes later plan degrades back to second one. The only thing helped (surprisingly) is vacuum. It also helps for some time, but once number of dead tuples reaches something like 300-500 - plan changes back to second one. Table details with bad plan: db1=> select * from pg_stat_user_tables where relname='table1'; -[ RECORD 1 ]-------+------------------------------ relid | 74813598 schemaname | db1 relname | table1 seq_scan | 1167 seq_tup_read | 315158718 idx_scan | 23116536 idx_tup_fetch | 42353778 n_tup_ins | 409288 n_tup_upd | 313114963 n_tup_del | 0 n_tup_hot_upd | 4683 n_live_tup | 409288 n_dead_tup | 379 n_mod_since_analyze | 1169 last_vacuum | 2020-02-12 08:58:06.147247+01 last_autovacuum | 2020-02-20 12:29:31.427082+01 last_analyze | last_autoanalyze | 2020-02-20 10:12:11.494305+01 vacuum_count | 2 autovacuum_count | 702 analyze_count | 0 autoanalyze_count | 20 Table details with good plan after vacuum: -[ RECORD 1 ]-------+------------------------------ relid | 74813598 schemaname | db1 relname | table1 seq_scan | 1167 seq_tup_read | 315158718 idx_scan | 23116662 idx_tup_fetch | 42360694 n_tup_ins | 409288 n_tup_upd | 313114963 n_tup_del | 0 n_tup_hot_upd | 4683 n_live_tup | 409288 n_dead_tup | 0 n_mod_since_analyze | 0 last_vacuum | 2020-02-26 08:10:49.884625+01 last_autovacuum | 2020-02-20 12:29:31.427082+01 last_analyze | 2020-02-26 08:13:07.253307+01 last_autoanalyze | 2020-02-20 10:12:11.494305+01 vacuum_count | 3 autovacuum_count | 702 analyze_count | 1 autoanalyze_count | 20 I can confirm that issue exists on 11.4 and 11.6 (updated to 11.6 since I thought it was a bu that may have been fixed in 11.5 "Fix possible failure of planner's index endpoint probes (Tom Lane)"), it is always reproducible, and can be reproducted on prod and all lower environments. I can't understand why it happens. As far as I understand there is something to do with visibility map, but as per documention it should effect index-only scans, not my case. Main two questions are: 1) Is it a bug? 2) If it is expected behaviour - can someone please explain why it happens and if there is any way to keep the good plan (without installing extensions to force pin plans) Best regards, Ilya
On 2/27/20 17:28, PG Bug reporting form wrote: > I can confirm that issue exists on 11.4 and 11.6 (updated to 11.6 since I > thought it was a bu that may have been fixed in 11.5 "Fix possible failure > of planner's index endpoint probes (Tom Lane)"), it is always reproducible, > and can be reproducted on prod and all lower environments. > I can't understand why it happens. As far as I understand there is something > to do with visibility map, but as per documention it should effect > index-only scans, not my case. > Main two questions are: > 1) Is it a bug? > 2) If it is expected behaviour - can someone please explain why it happens > and if there is any way to keep the good plan (without installing > extensions to force pin plans) > > Best regards, > Ilya > Hello, I don't think that's a bug, comparing these lines from both plans -> Bitmap Index Scan on content_idx (cost=0.00..155.07 rows=409 width=0) (actual time=4.932..4.932 rows=21952 loops=1) Index Cond: (content @> '{"anotherjsonkey": {"values": ["13"]}}'::jsonb) Buffers: shared hit=48 -> Bitmap Index Scan on content_idx (cost=0.00..27.11 rows=414 width=0) (actual time=4.287..4.287 rows=21952 loops=1) Index Cond: (content @> '{"anotherjsonkey": {"values": ["13"]}}'::jsonb) Buffers: shared hit=16 we can see that cost of bitmapscan is more in bad plan but actual time is about the same. I think that could be fixed by changing seq_page_cost/random_page cost to something like 1/10th of their current value (0.1/0.11) if your database is on SSD (or fits in memory). Change in cost possibly could be explained due to change in table size (if table size changed after vacuum). It's also possible to trigger autovacuum on that table more often by changing autovacuum_vacuum_scale_factor/autovacuum_vacuum_threshold (it's possible to do so on selected table only if necessary). In given case main reason of problem I think is that planner underestimate cost of rechecking condition on large jsonb value, AFAIK planner doesn't account width of the column but I might be wrong. -- Alexey Ermakov
PG Bug reporting form <noreply@postgresql.org> writes: > Issue description: > After some minimal activity in database first plan changes to the second. > Analyze on table1 do not help (tried with various default_statistics_target > values). > content_idx index recreation helps for some time, but several minutes later > plan degrades back to second one. > The only thing helped (surprisingly) is vacuum. It also helps for some time, > but once number of dead tuples reaches something like 300-500 - plan > changes back to second one. Surely it is *not* a bug that dead tuples affect the plan choice. The density of live tuples is an important factor in the relative costs of different table scan techniques. In the case at hand, I wonder why your rowcount estimate is off by a factor of 50: -> Bitmap Index Scan on content_idx (cost=0.00..155.07 rows=409 width=0) (actual time=4.932..4.932 rows=21952 loops=1) Index Cond: (content @> '{"anotherjsonkey": {"values": ["13"]}}'::jsonb) Buffers: shared hit=48 If you can't improve that you're not likely to get a good plan, and futzing around with cost factors to make this particular query do "the right thing" anyway is inevitably going to make things worse for other queries. Maybe a larger stats target for the content column would help, but I fear that this @> condition is just beyond the ability of the planner to estimate. You might need to redesign the data representation to make it a bit more SQL-friendly. regards, tom lane
Hello, Tom!
Thanks for your answer. My concern is that plan changes after a relatively small number of dead tuples. Bad plan is being generated when table1 contain 300-400 dead tuples. It is only 0.07%-0.08% of the whole table (409k+ entries).
In addition, table is growing and currently there are 425k of entries. However even on 425k size table plan stays good until number of dead tuples reaches 300-400, аfter that plan changes to the bad one.
As I said, I tried analyzing table with various default_statistics_target (100-1000 with step of 100) - plan stays bad. Tried setting random_page_cost=0.1 and seq_page_cost=1 (2, 3, 4, etc). Plan changed to good one only starting from random_page_cost=0.1 and seq_page_cost=8. However, once I ran vacuum - plan changed to good one and stayed the same even when I set random_page_cost=30;set seq_page_cost=1;
I realize that I can set autovacuum thresholds for this table to trigger it once dead tuples reach 300, but it doesn't seem right to me (this number of changes happens in something like 5 minutes and tables is ~2GB size as of now).
Why does such a small amount (0.07%) of dead tuples changes cost estimations so dramatically? Or am I missing something and dead tuples has nothing to do with it?
Thanks in advance,
Ilya
чт, 27 февр. 2020 г. в 18:41, Tom Lane <tgl@sss.pgh.pa.us>:
PG Bug reporting form <noreply@postgresql.org> writes:
> Issue description:
> After some minimal activity in database first plan changes to the second.
> Analyze on table1 do not help (tried with various default_statistics_target
> values).
> content_idx index recreation helps for some time, but several minutes later
> plan degrades back to second one.
> The only thing helped (surprisingly) is vacuum. It also helps for some time,
> but once number of dead tuples reaches something like 300-500 - plan
> changes back to second one.
Surely it is *not* a bug that dead tuples affect the plan choice.
The density of live tuples is an important factor in the relative
costs of different table scan techniques.
In the case at hand, I wonder why your rowcount estimate is off
by a factor of 50:
-> Bitmap Index Scan on content_idx (cost=0.00..155.07 rows=409
width=0) (actual time=4.932..4.932 rows=21952 loops=1)
Index Cond: (content @> '{"anotherjsonkey": {"values": ["13"]}}'::jsonb)
Buffers: shared hit=48
If you can't improve that you're not likely to get a good plan, and
futzing around with cost factors to make this particular query do
"the right thing" anyway is inevitably going to make things worse
for other queries. Maybe a larger stats target for the content column
would help, but I fear that this @> condition is just beyond
the ability of the planner to estimate. You might need to redesign
the data representation to make it a bit more SQL-friendly.
regards, tom lane
Илья Сербин | Старший Администратор БД
iserbin@bostonsd.ru
Мобильный : 8-918-895-05-96 | Telegram : @iserbin_61
DBI 24/7 контакт : 8-800-333-65-79 | Сайт : http://dbi.ru/
Мобильный : 8-918-895-05-96 | Telegram : @iserbin_61
DBI 24/7 контакт : 8-800-333-65-79 | Сайт : http://dbi.ru/
Hello,
Asking again just in case someone can help to find out why is that happening in my case and if it is a bug.
пт, 28 февр. 2020 г. в 12:25, Serbin, Ilya <iserbin@bostonsd.ru>:
Hello, Tom!Thanks for your answer. My concern is that plan changes after a relatively small number of dead tuples. Bad plan is being generated when table1 contain 300-400 dead tuples. It is only 0.07%-0.08% of the whole table (409k+ entries).In addition, table is growing and currently there are 425k of entries. However even on 425k size table plan stays good until number of dead tuples reaches 300-400, аfter that plan changes to the bad one.As I said, I tried analyzing table with various default_statistics_target (100-1000 with step of 100) - plan stays bad. Tried setting random_page_cost=0.1 and seq_page_cost=1 (2, 3, 4, etc). Plan changed to good one only starting from random_page_cost=0.1 and seq_page_cost=8. However, once I ran vacuum - plan changed to good one and stayed the same even when I set random_page_cost=30;set seq_page_cost=1;I realize that I can set autovacuum thresholds for this table to trigger it once dead tuples reach 300, but it doesn't seem right to me (this number of changes happens in something like 5 minutes and tables is ~2GB size as of now).Why does such a small amount (0.07%) of dead tuples changes cost estimations so dramatically? Or am I missing something and dead tuples has nothing to do with it?
On Wed, Mar 11, 2020 at 6:51 AM Serbin, Ilya <iserbin@bostonsd.ru> wrote: > > Hello, > Asking again just in case someone can help to find out why is that happening in my case and if it is a bug. > > пт, 28 февр. 2020 г. в 12:25, Serbin, Ilya <iserbin@bostonsd.ru>: >> >> Hello, Tom! >> Thanks for your answer. My concern is that plan changes after a relatively small number of dead tuples. Bad plan is beinggenerated when table1 contain 300-400 dead tuples. It is only 0.07%-0.08% of the whole table (409k+ entries). >> In addition, table is growing and currently there are 425k of entries. However even on 425k size table plan stays gooduntil number of dead tuples reaches 300-400, аfter that plan changes to the bad one. >> As I said, I tried analyzing table with various default_statistics_target (100-1000 with step of 100) - plan stays bad.Tried setting random_page_cost=0.1 and seq_page_cost=1 (2, 3, 4, etc). Plan changed to good one only starting from random_page_cost=0.1 and seq_page_cost=8. However, once I ran vacuum - plan changed to good one and stayed the same evenwhen I set random_page_cost=30;set seq_page_cost=1; >> I realize that I can set autovacuum thresholds for this table to trigger it once dead tuples reach 300, but it doesn'tseem right to me (this number of changes happens in something like 5 minutes and tables is ~2GB size as of now). >> >> Why does such a small amount (0.07%) of dead tuples changes cost estimations so dramatically? Or am I missing somethingand dead tuples has nothing to do with it? That looks like an almost impossible query to plan correctly -- the planner would need some kind of stats on fields internal to the json blob, but it can't have that since the blob is necessarily schema less (at least from PG's perspective). That's why Tom is wondering about making the representation map more closely to a SQL/relational table model with discrete columns. Alternatively have you tried adding function indexes on the (it looks like) ~2 paths you're querying in the JSON field? It's not impossible there's a bug in the planner here, but the opaqueness of this query to the planner is why that's not anyone's first assumption. To show otherwise someone would have to step through the execution and show that it's not just reaching some cutoff (that makes somewhat reasonable sense for how hard this query is to plan) or that the cutoff is being calculated incorrectly. James
It seems like I found the reason. I've recreated the "content_idx" gin (content jsonb_path_ops) with fastupdate=false, and good plan stayed even on deadtuples = 4000+ ( as I said previously good plan changed to bad starting on ~300 dead tuples in table).
So, as far as I understand the issue raises once gin pending-entry list reaches ~300 entries. AFAIK gin pending-entry list is stored unsorted, so optimizer may decide to choose another plan to be used since cost estimations for scanning unsorted gin pending-entry list may be to high.
Once I disabled fastupdate mechanism I've raised overhead for write operations, but scanning uses only index body, without gin pending-entry (since it's not presented anymore).
Would much appreciate if someone can confirm or disprove my conclusions.