Обсуждение: BUG #16586: deduplicate_items=true can be configured for numeric indexes
BUG #16586: deduplicate_items=true can be configured for numeric indexes
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 16586 Logged by: Matthias van de Meent Email address: matthias.vandemeent@cofano.nl PostgreSQL version: 13beta3 Operating system: Debian Stretch (9.13) Description: > CREATE INDEX numerical_index ON table USING btree ((num::numeric)) WITH (deduplicate_items=true); CREATE INDEX > \d+ numerical_index Index "public.numerical_index" Column | Type | Key? | Definition | Storage | Stats target --------+---------+------+------------+---------+-------------- num | numeric | yes | num | main | btree, for table "public.table" Options: deduplicate_items=true There is no error for specifying the "deduplicate_items" -flag. As deduplication is not supported for indexes with numeric type, I expected the index creation statement to error.
Re: BUG #16586: deduplicate_items=true can be configured for numeric indexes
От
Peter Geoghegan
Дата:
On Thu, Aug 20, 2020 at 4:52 AM PG Bug reporting form <noreply@postgresql.org> wrote: > There is no error for specifying the "deduplicate_items" -flag. As > deduplication is not supported for indexes with numeric type, I expected the > index creation statement to error. I don't think that there should be an error. While the "equalimage"-ness of an operator class (such as btree/numeric_ops) is in theory static, in practice it could change in either direction. For example, it's possible (though very unlikely) that somebody will make the mistake of marking an operator class as equalimage/dedup safe when they shouldn't have. If this actually happens, a REINDEX shouldn't raise errors with the same spelling of REINDEX that worked the first time (e.g. when restoring a dump). The deduplicate_items storage parameter is kind of an advisory thing. Deduplication is always applied selectively in unique indexes, even though it might be slightly better to do so consistently with some workloads. Also, it's possible that we'll find a way to make some of the operator classes (though not btree/numeric_ops) deduplication safe in the future. For example, we could teach container types to report their "equalimage"-ness by invoking the underlying support function of contained types. So you could use deduplication with a composite type, provided it didn't contain unsafe scalar types like numeric. In general I don't expect that users will consciously think about deduplication very often -- it's supposed to have very little overhead in cases that don't benefit, so it will probably fade into the background even in installations where it provides a lot of benefit. I don't expect many users will want to make sure that it's enabled in one index but definitely not enabled in another. With all of that said, it would be nice if I could raise a NOTICE or even a WARNING here if and only if the user spelled out "deduplicate_items = on". Hard to see how to do that with the current design of reloptions, though, unless it's okay to show it even when "deduplicate_items = on" was not specifically provided (I don't think that it's okay). An index access method (such as nbtree) can tell whether or not all storage params should come from the defaults by checking if the rel's rd_options is NULL or not, but that's not the same thing -- it'll be set when fillfactor was explicitly set, for example. -- Peter Geoghegan
Re: BUG #16586: deduplicate_items=true can be configured for numeric indexes
От
Matthias van de Meent
Дата:
On Sat, 22 Aug 2020 at 00:49, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Aug 20, 2020 at 4:52 AM PG Bug reporting form > <noreply@postgresql.org> wrote: > > There is no error for specifying the "deduplicate_items" -flag. As > > deduplication is not supported for indexes with numeric type, I expected the > > index creation statement to error. > > I don't think that there should be an error. While the > "equalimage"-ness of an operator class (such as btree/numeric_ops) is > in theory static, in practice it could change in either direction. For > example, it's possible (though very unlikely) that somebody will make > the mistake of marking an operator class as equalimage/dedup safe when > they shouldn't have. If this actually happens, a REINDEX shouldn't > raise errors with the same spelling of REINDEX that worked the first > time (e.g. when restoring a dump). > > The deduplicate_items storage parameter is kind of an advisory thing. The current documentation is quite unclear about that, as the flag itself is documented as "Controls usage of the B-tree deduplication technique described in Section 63.4.2.". A note "Even when configured, the feature will not be used if it does not pass the limitations as described in section 63.4.2" would help in preventing confusion. > Deduplication is always applied selectively in unique indexes, even > though it might be slightly better to do so consistently with some > workloads. Also, it's possible that we'll find a way to make some of > the operator classes (though not btree/numeric_ops) deduplication safe > in the future. For example, we could teach container types to report > their "equalimage"-ness by invoking the underlying support function of > contained types. So you could use deduplication with a composite type, > provided it didn't contain unsafe scalar types like numeric. > > In general I don't expect that users will consciously think about > deduplication very often -- it's supposed to have very little overhead > in cases that don't benefit, so it will probably fade into the > background even in installations where it provides a lot of benefit. I > don't expect many users will want to make sure that it's enabled in > one index but definitely not enabled in another. > > With all of that said, it would be nice if I could raise a NOTICE or > even a WARNING here if and only if the user spelled out > "deduplicate_items = on". Hard to see how to do that with the current > design of reloptions, though, unless it's okay to show it even when > "deduplicate_items = on" was not specifically provided (I don't think > that it's okay). An index access method (such as nbtree) can tell > whether or not all storage params should come from the defaults by > checking if the rel's rd_options is NULL or not, but that's not the > same thing -- it'll be set when fillfactor was explicitly set, for > example. Thanks for the reply, it was very insightful. - Matthias > -- > Peter Geoghegan