Re: BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: tibor(at)neszt(dot)hu, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose
Date: 2019-08-13 01:49:26
Message-ID: CA+hUKG+8sChHBccNdud=z2rCaWzojz5dMBkpHQhKVmvPfuBNcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Aug 13, 2019 at 10:25 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Tue, Aug 13, 2019 at 8:23 AM PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> > "cimek_irsz_telepules_telepulesresz_kozternev_kozterjelleg_h_key" UNIQUE
> > CONSTRAINT, btree (irsz, telepules, telepulesresz, kozternev, kozterjelleg,
> > hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel)

> How old is the duplicate data? I guess you are using the Hungarian
> locale as the default collation for your database (see "Collate" in
> the output of \l, or check for explicit collations on the relevant
> columns). I think that is one of the collations that has moved around
> a bit in recent years in glibc. For example:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=13547

... and to give a little context about why that is relevant, please
see these links:

https://lists.debian.org/debian-glibc/2019/03/msg00030.html
https://postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html

They're discussing a big change that affects almost everyone upgrading
their glibc (and similar things happen with non-GNU OSes' libc too),
but in the past there have been upgrades that affected specific
individual locales. German and Hungarian are known examples from
recent memory, like the commits referenced in that Bugzilla ticket;
you could probably figure out which glibc versions those landed in and
see if that matches your Ubuntu upgrade history. For example, you can
see that they changed their minds about the sort order of "ssz" vs
"szsz" (you can probably see that by piping a file containing "kasza"
and "kassza" through the sort command on an old enough and new enough
Ubuntu release with LANG=hu_HU.UTF-8), and the hypothesis is that some
rule change like that caused us to take a wrong turn while descending
a btree and then conclude that your duplicated street name wasn't
already in the index, when in fact it was. Oops.

As Christoph mentioned on the Debian glibc mailing list, we'd like to
be able to be able to handle this sort of thing better; we're entirely
dependent on the OS vendor maintaining a stable sort order so that we
can keep our indexes in the right order, but POSIX offers us no way to
know when they've changed (a problem I'm contemplating raising with
the Austin Group/POSIX maintainers). One option is to use ICU
collations, because we have some limited ability to track when
definitions changed, potentially invalidating an index, but we haven't
finished adding ICU support yet: for now you can't use an ICU
collation as a database default. You can still use it explicitly
though. It's an unfortunate problem; we looked into whether we could
query the glibc version, but it turned out that some of the
distributions back-patch the locale changes without changing the glibc
version. About the only easy way you can detect this problem is to
run the 'amcheck' index checker, or just assume the worst and REINDEX,
after an OS upgrade.

--
Thomas Munro
https://enterprisedb.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2019-08-13 02:36:11 Re: 回复: BUG #15949: pg_basebackup failed(PG-12-Beta3)
Previous Message 两个孩子的爹 2019-08-13 01:18:36 回复: BUG #15949: pg_basebackup failed(PG-12-Beta3)