Обсуждение: BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose
The following bug has been logged on the website: Bug reference: 15952 Logged by: Neszt Tibor Email address: tibor@neszt.hu PostgreSQL version: 9.6.14 Operating system: Debian 8.11 Description: Schema: => \d cimek Table "public.cimek" Column | Type | Modifiers ----------------+-----------------------+----------------------------------------------------- kod | bigint | not null default nextval('cimek_kod_seq'::regclass) irsz | character varying(16) | not null default ''::character varying telepules | character varying(64) | not null default ''::character varying telepulesresz | character varying(64) | not null default ''::character varying kozternev | character varying(64) | not null default ''::character varying kozterjelleg | character varying(32) | not null default ''::character varying hazszam | character varying(64) | not null default ''::character varying hazszambetujel | character varying(64) | not null default ''::character varying hazszamvege | character varying(64) | not null default ''::character varying lepcsohaz | character varying(32) | not null default ''::character varying em | character varying(32) | not null default ''::character varying ajto | character varying(64) | not null default ''::character varying ajtobetujel | character varying(64) | not null default ''::character varying longitude | character varying(32) | not null default ''::character varying latitude | character varying(32) | not null default ''::character varying Indexes: "cimek_pkey" PRIMARY KEY, btree (kod) "cimek_irsz_telepules_telepulesresz_kozternev_kozterjelleg_h_key" UNIQUE CONSTRAINT, btree (irsz, telepules, telepulesresz, kozternev, kozterjelleg, hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel) Actual behavior: => select count(*), min(kod), max(kod), irsz, telepules, telepulesresz, kozternev, kozterjelleg, hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel from cimek group by irsz, telepules, telepulesresz, kozternev, kozterjelleg, hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel having count(*) > 1; count | min | max | irsz | telepules | telepulesresz | kozternev | kozterjelleg | hazszam | hazszambetujel | hazszamvege | lepcsohaz | em | ajto | ajtobetujel -------+-------+-------+------+-----------+---------------+-----------+--------------+---------+----------------+-------------+-----------+----+------+------------- 2 | 51611 | 55269 | 2092 | Budakeszi | | Foo | utca | 46 | | | | | | (1 row) The main problem is that there are two records with the same data set. The real street name was altered with 'Foo' in this bugreport. I tried to reproduce the issue locally without success. I encountered this bug in multiple databases with different table and data. I could upgrade the database to version 11 with "pg_upgradecluster 11 main -m upgrade -k", and the issue still remain. To make sure it's not a human error, I tried to dump the data and restore it. I got duplicate key error as expected.
On Tue, Aug 13, 2019 at 8:23 AM PG Bug reporting form <noreply@postgresql.org> wrote: > "cimek_irsz_telepules_telepulesresz_kozternev_kozterjelleg_h_key" UNIQUE > CONSTRAINT, btree (irsz, telepules, telepulesresz, kozternev, kozterjelleg, > hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel) > I tried to reproduce the issue locally without success. I encountered this > bug in multiple databases with different table and data. I could upgrade the > database to version 11 with "pg_upgradecluster 11 main -m upgrade -k", and > the issue still remain. To make sure it's not a human error, I tried to dump > the data and restore it. I got duplicate key error as expected. How old is the duplicate data? I guess you are using the Hungarian locale as the default collation for your database (see "Collate" in the output of \l, or check for explicit collations on the relevant columns). I think that is one of the collations that has moved around a bit in recent years in glibc. For example: https://sourceware.org/bugzilla/show_bug.cgi?id=13547 -- Thomas Munro https://enterprisedb.com
On Tue, Aug 13, 2019 at 10:25 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, Aug 13, 2019 at 8:23 AM PG Bug reporting form > <noreply@postgresql.org> wrote: > > "cimek_irsz_telepules_telepulesresz_kozternev_kozterjelleg_h_key" UNIQUE > > CONSTRAINT, btree (irsz, telepules, telepulesresz, kozternev, kozterjelleg, > > hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel) > How old is the duplicate data? I guess you are using the Hungarian > locale as the default collation for your database (see "Collate" in > the output of \l, or check for explicit collations on the relevant > columns). I think that is one of the collations that has moved around > a bit in recent years in glibc. For example: > > https://sourceware.org/bugzilla/show_bug.cgi?id=13547 ... and to give a little context about why that is relevant, please see these links: https://lists.debian.org/debian-glibc/2019/03/msg00030.html https://postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html They're discussing a big change that affects almost everyone upgrading their glibc (and similar things happen with non-GNU OSes' libc too), but in the past there have been upgrades that affected specific individual locales. German and Hungarian are known examples from recent memory, like the commits referenced in that Bugzilla ticket; you could probably figure out which glibc versions those landed in and see if that matches your Ubuntu upgrade history. For example, you can see that they changed their minds about the sort order of "ssz" vs "szsz" (you can probably see that by piping a file containing "kasza" and "kassza" through the sort command on an old enough and new enough Ubuntu release with LANG=hu_HU.UTF-8), and the hypothesis is that some rule change like that caused us to take a wrong turn while descending a btree and then conclude that your duplicated street name wasn't already in the index, when in fact it was. Oops. As Christoph mentioned on the Debian glibc mailing list, we'd like to be able to be able to handle this sort of thing better; we're entirely dependent on the OS vendor maintaining a stable sort order so that we can keep our indexes in the right order, but POSIX offers us no way to know when they've changed (a problem I'm contemplating raising with the Austin Group/POSIX maintainers). One option is to use ICU collations, because we have some limited ability to track when definitions changed, potentially invalidating an index, but we haven't finished adding ICU support yet: for now you can't use an ICU collation as a database default. You can still use it explicitly though. It's an unfortunate problem; we looked into whether we could query the glibc version, but it turned out that some of the distributions back-patch the locale changes without changing the glibc version. About the only easy way you can detect this problem is to run the 'amcheck' index checker, or just assume the worst and REINDEX, after an OS upgrade. -- Thomas Munro https://enterprisedb.com