Collation versioning

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Collation versioning
Date: 2018-09-04 17:02:32
Message-ID: CAEepm=0uEQCpfq_+LYFBdArCe4Ot98t1aR4eYiYTe=yavQygiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

While reviewing the ICU versioning work a while back, I mentioned the
idea of using a user-supplied command to get a collversion string for
the libc collation provider. I was reminded about that by recent news
about an upcoming glibc/CLDR resync that is likely to affect
PostgreSQL users (though, I guess, probably only when they do a major
OS upgrade). Here's an experimental patch to try that idea out. For
example, you might set it like this:

libc_collation_version_command = 'md5
/usr/share/locale/@LC_COLLATE@/LC_COLLATE | sed "s/.* = //"'

... or, on a Debian system using the locales package, like this:

libc_collation_version_command = 'dpkg -s locales | grep Version: |
sed "s/Version: //"'

Using the checksum approach, it works like this:

postgres=# alter collation "xx_XX" refresh version;
NOTICE: changing version from b88d621596b7e61337e832f7841066a9 to
7b008442fbaf5dfe7a10fb3d82a634ab
ALTER COLLATION
postgres=# select * from pg_collation where collname = 'xx_XX';
-[ RECORD 1 ]-+---------------------------------
collname | xx_XX
collnamespace | 2200
collowner | 10
collprovider | c
collencoding | 6
collcollate | en_US.UTF-8
collctype | UTF-8
collversion | 7b008442fbaf5dfe7a10fb3d82a634ab

When the collation definition changes you get the desired scary
warning on next attempt to use it in a fresh backend:

postgres=# select * from t order by v;
WARNING: collation "xx_XX" has version mismatch
DETAIL: The collation in the database was created using version
b88d621596b7e61337e832f7841066a9, but the operating system provides
version 7b008442fbaf5dfe7a10fb3d82a634ab.
HINT: Rebuild all objects affected by this collation and run ALTER
COLLATION public."xx_XX" REFRESH VERSION, or build PostgreSQL with the
right library version.

The problem is that it isn't in effect at initdb time so if you add
that later it only affects new locales. You'd need a way to do that
during init to capture the imported system locale versions, and that's
a really ugly string to have to pass into some initdb option. Ugh.

Another approach would be to decide that we're willing to put
non-portable version extracting magic in pg_locale.c. On a long
flight I hacked my libc to store a version string (based on CLDR
version or whatever) in its binary locale definitions and provide a
proper interface to ask for it, modelled on querylocale(3):

const char *querylocaleversion(int mask, locale_t locale);

Then the patch for pg_locale.c is trivial, see attached. While I
could conceivably try to convince my local friendly OS to take such a
patch, the real question is how to deal with glibc. Does anyone know
of a way to extract a version string from glibc using existing
interfaces? I heard there was an undocumented way but I haven't been
able to find it -- probably because I was, erm, looking in the
documentation.

Or maybe this isn't worth bothering with, and we should just build out
the ICU support and then make it the default and be done with it.

In passing, here's a patch to add tab completion for ALTER COLLATION
... REFRESH VERSION.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Auto-complete-ALTER-COLLATION-.-REFRESH-VERSION.patch application/octet-stream 910 bytes
0001-Add-libc_collation_version_command-GUC.patch application/octet-stream 3.5 KB
0001-Add-libc_collation_version_command-GUC.patch application/octet-stream 3.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-09-04 17:04:42 Re: Collation versioning
Previous Message Alvaro Herrera 2018-09-04 16:50:25 Re: pointless check in RelationBuildPartitionDesc