Обсуждение: Enabling deduplication with system catalog indexes
System catalog indexes do not support deduplication as a matter of policy. I chose to do things that way during the Postgres 13 development cycle due to the restriction on using storage parameters with system catalog indexes. At the time I felt that *forcing* the use of deduplication with system catalog indexes might expose users to problems. But this is something that seems worth revisiting now. (I haven't actually investigated what it would take to make system catalogs support the 'deduplicate_items' parameter, but that may not matter now.) I would like to enable deduplication within system catalog indexes for Postgres 15. Leaving it disabled forever seems kind of arbitrary at best. In general enabling deduplication (or not disabling it) has only a fixed, small downside in the worst case. It has a huge upside in favorable cases. Deduplication is part of our high level strategy for avoiding nbtree index bloat from version churn (non-HOT updates with several indexes that are never "logically modified"). It effectively cooperates with and enhances the new enhancements to index deletion in Postgres 14. Plus these recent index deletion enhancements more or less eliminated a theoretical downside of deduplication: now it doesn't really matter that posting list tuples only have a single LP_DEAD bit (if it ever did). This is because we can now do granular posting list TID deletion, provided the deletion process visits the same heap block in passing. I can find no evidence that even one single user found it useful to disable deduplication while using Postgres 13 in production (by searching for "deduplicate_items" on Google). While I myself said that there might be a regression of up to 2% of throughput back in early 2020, that was under highly unrealistic conditions, that could never apply to system catalogs -- I was being conservative. Most system catalog indexes are unique indexes, where there is no possible overhead from deduplication unless we already know for sure that the index is subject to some kind of version churn (and so have high confidence that deduplication will be at least somewhat effective at buying time for VACUUM). The non-unique system catalog indexes seem pretty likely to benefit from deduplication in the usual obvious way (not so much because of versioning and bloat). The two pg_depend non-unique indexes tend to have a fair number of duplicates. -- Peter Geoghegan
On Wed, Sep 29, 2021 at 11:27 AM Peter Geoghegan <pg@bowt.ie> wrote: > I would like to enable deduplication within system catalog indexes for > Postgres 15. I decided to run a simple experiment, to give us some idea of what benefits my proposal gives users: I ran "make installcheck" on a newly initdb'd database (master branch), and then with the attached patch (which enables deduplication with system catalog indexes) applied. I ran a query that shows the 20 largest system catalog indexes in each case. I'm interested in when and where we see improvements to space utilization. Any reduction in index size must be a result of index deduplication (excluding any noise-level changes). Master branch: regression=# SELECT pg_size_pretty(pg_relation_size(c.oid)) as sz, c.relname FROM pg_index i JOIN pg_opclass op ON i.indclass[0] = op.oid JOIN pg_am am ON op.opcmethod = am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog' AND c.relkind = 'i' AND i.indisready AND i.indisvalid ORDER BY pg_relation_size(c.oid) DESC LIMIT 20; sz | relname ---------+----------------------------------- 1088 kB | pg_attribute_relid_attnam_index 928 kB | pg_depend_depender_index 800 kB | pg_attribute_relid_attnum_index 736 kB | pg_depend_reference_index 352 kB | pg_proc_proname_args_nsp_index 216 kB | pg_description_o_c_o_index 200 kB | pg_class_relname_nsp_index 184 kB | pg_type_oid_index 176 kB | pg_class_tblspc_relfilenode_index 160 kB | pg_type_typname_nsp_index 104 kB | pg_proc_oid_index 64 kB | pg_class_oid_index 64 kB | pg_statistic_relid_att_inh_index 56 kB | pg_collation_name_enc_nsp_index 48 kB | pg_constraint_conname_nsp_index 48 kB | pg_amop_fam_strat_index 48 kB | pg_amop_opr_fam_index 48 kB | pg_largeobject_loid_pn_index 48 kB | pg_operator_oprname_l_r_n_index 48 kB | pg_index_indexrelid_index (20 rows) Patch: sz | relname ---------+----------------------------------- 1048 kB | pg_attribute_relid_attnam_index 888 kB | pg_depend_depender_index 752 kB | pg_attribute_relid_attnum_index 616 kB | pg_depend_reference_index 352 kB | pg_proc_proname_args_nsp_index 216 kB | pg_description_o_c_o_index 192 kB | pg_class_relname_nsp_index 184 kB | pg_type_oid_index 152 kB | pg_type_typname_nsp_index 144 kB | pg_class_tblspc_relfilenode_index 104 kB | pg_proc_oid_index 72 kB | pg_class_oid_index 56 kB | pg_collation_name_enc_nsp_index 56 kB | pg_statistic_relid_att_inh_index 48 kB | pg_index_indexrelid_index 48 kB | pg_amop_fam_strat_index 48 kB | pg_amop_opr_fam_index 48 kB | pg_largeobject_loid_pn_index 48 kB | pg_operator_oprname_l_r_n_index 40 kB | pg_index_indrelid_index (20 rows) The improvements to space utilization for the larger indexes (especially the two pg_depends non-unique indexes) is smaller than I remember from last time around, back in early 2020. This is probably due to a combination of the Postgres 14 work and the pg_depend PIN optimization work from commit a49d0812. The single biggest difference is the decrease in the size of pg_depend_reference_index -- it goes from 736 kB to 616 kB. Another notable difference is that pg_class_tblspc_relfilenode_index shrinks, going from 176 kB to 144 kB. These are not huge differences, but they still seem worth having. The best argument in favor of my proposal is definitely the index bloat argument, which this test case tells us little or nothing about. I'm especially concerned about scenarios where logical replication is used, or where index deletion and VACUUM are inherently unable to remove older index tuple versions for some other reason. -- Peter Geoghegan
Вложения
On Wed, Sep 29, 2021 at 3:32 PM Peter Geoghegan <pg@bowt.ie> wrote: > I decided to run a simple experiment, to give us some idea of what > benefits my proposal gives users: I ran "make installcheck" on a newly > initdb'd database (master branch), and then with the attached patch > (which enables deduplication with system catalog indexes) applied. I will commit this patch in a few days, barring objections. -- Peter Geoghegan
On 9/30/21, 3:44 PM, "Peter Geoghegan" <pg@bowt.ie> wrote: > On Wed, Sep 29, 2021 at 3:32 PM Peter Geoghegan <pg@bowt.ie> wrote: >> I decided to run a simple experiment, to give us some idea of what >> benefits my proposal gives users: I ran "make installcheck" on a newly >> initdb'd database (master branch), and then with the attached patch >> (which enables deduplication with system catalog indexes) applied. > > I will commit this patch in a few days, barring objections. +1 Nathan
On Fri, Oct 1, 2021 at 2:35 PM Bossart, Nathan <bossartn@amazon.com> wrote: > On 9/30/21, 3:44 PM, "Peter Geoghegan" <pg@bowt.ie> wrote: > > I will commit this patch in a few days, barring objections. > > +1 Okay, pushed. Thanks -- Peter Geoghegan