Re: Collation versioning

From: Douglas Doole <dougdoole(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2018-09-17 17:01:13
Message-ID: CADE5jYKEa81RmX5RTvDnD+TQsdSVKks1QazD4aoiz+4SNV1=5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 16, 2018 at 1:14 PM Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:

> We have to start somewhere and indexes are the first thing that people
> notice, and are much likely to actually be a problem (personally I've
> encountered many cases of index corruption due to collation changes in
> the wild, but never a constraint corruption,

Problems that people notice are good - it's the ones that they don't notice
that cause real grief ;-)

Given that PostgreSQL requires equal values to be binary identical I think
that you'll avoid a lot of the problems that caused me so much trouble in
DB2. Even if someone creates a range constraint or partitioned table, the
boundary values aren't typically going to be impacted (my French example
was a little contrived). That said, there have been a few changes that
would have much more obvious impacts. One language stopped treating 'ch' as
a single collating element that came somewhere after 'c'+'z' and treated it
as 'c'+'h' instead. And and one of the North Germanic languages moved an
accented character from after 'z' to before 'z' (or maybe it was vice versa
- I miss my library of Unicode presentations).

Once you get into downstream effects of changes (whether they are
>
recorded in the database or elsewhere), I think it's basically beyond
> our event horizon.

...

> I think we should make a clear distinction between
> things that invalidate the correct working of the database, and more
> nebulous effects that we can't possibly track in general.
>

I agree that PostgreSQL can't resolve the downstream changes, but that's a
very subtle distinction. As a user, if an upgrade caused my data to no
longer comply with my carefully architected and database enforced BI rules,
I would definitely argue that the correct working of the database has been
invalidated. (You can make technical arguments about the OS upgrading the
library, but the fundamental issue is that my PostgreSQL database is
broken.)

You can probably argue that PostgreSQL and DB2 users look at the world
differently, but that's why DB2 ended up shipping its own copies of the ICU
library. Once a user creates an object using ICU vX, we made sure that
version of the library was always available to avoid these problems. (The
libraries were on a private path with non-standard names so there was no
collision with the OS library. Fortunately I'd moved on before anyone
really started complaining about why 5 copies of ICU were being installed
when they weren't even using Unicode.)

> [1]
> https://www.postgresql.org/message-id/CAEepm%3D30SQpEUjau%3DdScuNeVZaK2kJ6QQDCHF75u5W%3DCz%3D3Scw%40mail.gmail.com

I'd missed this post. In it you asked:

I wonder how it manages to deal with fr_CA's reversed secondary weighting
rule which requires you to consider diacritics in reverse order --
apparently abandoned in France but still used in Canada -- using a fixed
size space for state between calls.

ucol_nextSortKeyPart() only keeps track of the current position in the
generated sort key as its state. So, if you call it multiple times to
generate the sort key piecemeal, it recomputes the entire sort key until it
has enough bytes to satisfy your request. (That is, if you're doing 4 bytes
at a time, on the first call it generates and returns bytes 1-4. On the
second call it generates 1-8 and returns 5-8. Next it generates 1-12 and
returns 9-12.) Needless to say, this gets very expensive very fast.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-09-17 17:11:06 Re: Online verification of checksums
Previous Message Tomas Vondra 2018-09-17 17:00:31 Re: Online verification of checksums