Re: Collation versioning

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Douglas Doole <dougdoole(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2019-05-28 19:00:00
Message-ID: CA+hUKGLaR8zXhXhirNG5aT2EQP8ARFimHFai7iUgdYz_jNkocw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 28, 2018 at 9:30 AM Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 16/09/2018 20:12, Douglas Doole wrote:
> > All this collation stuff is great, and I know users want it, but it
> > feels like were pushing them out of an airplane with a ripped parachute
> > every time the collation libraries change. Maybe they'll land safely or
> > maybe things will get very messy.
>
> At some point, a schema designer also needs to take some responsibility
> for making smart choices for longevity. It is known that collations can
> change, and the sort of changes that can happen are also generally
> understood. So if you want to use range partitioning on text fields,
> maybe you shouldn't, or at least choose the ranges conservatively.
> Similarly, maybe you shouldn't have timestamp range partition boundaries
> around DST changes or on the 29th of every month, and maybe you
> shouldn't partition float values at negative zero. Some ideas are
> better than others. We will help you recognize and fix breakage, but we
> can't prevent it altogether.

Since there's a chance of an "unconference" session on locale versions
tomorrow at PGCon, here's a fresh rebase of the patchset to add
per-database-object collation version tracking. It doesn't handle
default collations yet (not hard AFAIK, will try that soon), but it
does work well enough to demonstrate the generate principal. I won't
attach the CHECK support just yet, because it needs more work, but the
point of it was to demonstrate that pg_depend can handle this for all
kinds of database objects in one standard way, rather than sprinkling
collation version stuff all over the place in pg_index, pg_constraint,
etc, and I think it did that already.

postgres=# create table t (k text collate "en-x-icu");
CREATE TABLE
postgres=# create index on t(k);
CREATE INDEX
postgres=# select refobjversion from pg_depend where refobjversion != '';
refobjversion
---------------
153.72
(1 row)

Mess with it artificially (or install a different version of ICU):
postgres=# update pg_depend set refobjversion = '42' where
refobjversion = '153.72';
UPDATE 1

In a new session, we get a warning when first loading the index
because the version doesn't match:
postgres=# select * from t where k = 'x';
psql: WARNING: index "t_k_idx" depends on collation 12711 version
"42", but the current version is "153.72"
DETAIL: The index may be corrupted due to changes in sort order.
HINT: REINDEX to avoid the risk of corruption.
k
---
(0 rows)

The warning can be cleared for the indexes on that one table like so:
postgres=# reindex table t;
REINDEX

You can see that it's captured the new version:
postgres=# select refobjversion from pg_depend where refobjversion != '';
refobjversion
---------------
153.72
(1 row)

--
Thomas Munro
https://enterprisedb.com

Attachment Content-Type Size
0001-Remove-pg_collation.collversion.patch application/octet-stream 20.8 KB
0002-Add-pg_depend.refobjversion.patch application/octet-stream 10.3 KB
0003-Track-collation-versions-for-indexes.patch application/octet-stream 10.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-05-28 19:04:46 Re: Confusing error message for REINDEX TABLE CONCURRENTLY
Previous Message Andres Freund 2019-05-28 18:37:54 Re: Indexing - comparison of tree structures