Re: Collation versioning

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Douglas Doole <dougdoole(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2018-09-24 21:11:31
Message-ID: CAH2-WzkH_XwzHjG_zAnOtmRVnJeB8T6PUKVEJvy8yk=pFWGoZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 24, 2018 at 1:47 PM Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Personally I'm not planning to work on multi-version installation any
> time soon, I was just scoping out some basic facts about all this. I
> think the primary problem that affects most of our users is the
> shifting-under-your-feet problem, which we now see applies equally to
> libc and libicu.

Are we sure about that? Could it just be that ICU will fix bugs that
cause their strcoll()-alike and strxfrm()-alike functions to give
behavior that isn't consistent with the behavior required by the CLDR
version in use?

This seems like it might be a very useful distinction. We know that
glibc had bugs that were caused by strxfrm() not agreeing with
strcoll() -- that was behind the 9.5-era abbreviated keys issues. But
that was actually a bug in an optimization in strcoll(), rather than a
strxfrm() bug. strxfrm() gave the correct answer, which is to say the
answer that was right according to the high level collation
definition. It merely failed to be bug-compatible with strcoll().
What's ICU supposed to do about an issue like that?

If we're going to continue to rely on the strxfrm() equivalent from
ICU, then it seems to me that ICU should be able to change behaviors
in a stable release, provided the behavior they're changing is down to
a bug in their infrastructure, as opposed to an organic evolution in
how some locale sorts text (CLDR update). My understanding is that ICU
is designed to decouple technical issues with issues of concern to
natural language experts, so we as an ICU client can limit ourselves
to worrying about one of the two at any given time.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-09-24 21:36:39 Re: PATCH: Update snowball stemmers
Previous Message Alvaro Herrera 2018-09-24 21:11:17 Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru