Re: Collation versioning

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Douglas Doole <dougdoole(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2019-11-04 21:15:43
Message-ID: CA+hUKGJk_Hc5DBiEcUxCqWhmzRwq9TfNZ2P1zjP14DrPczFEHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 4, 2019 at 11:13 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> On Mon, Nov 4, 2019 at 4:58 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > * Some have expressed doubt that pg_depend is the right place for
> > this; let's see if any counter-proposals appear.
>
> When working on the REINDEX FILTER, I totally missed this thread and
> wrote a POC saving the version in pg_index. That's not ideal though,
> as you need to record multiple version strings. In my version I used
> a json type, using the collprovider as the key, but that's not enough
> for ICU as each collation can have a different version string. I'm
> not a huge fan of using pg_depend to record the version, but storing a
> collprovider/collname -> version per row in pg_index is definitely a
> no go, so I don't have any better counter-proposal.

Yeah, I also thought about using pg_index directly, and was annoyed by
the denormalisation you mention (an array of {collation, version}!?)
and so I realised I wanted another table like they teach you at
database school, but I also realised that there are other kinds of
database objects that depend on collations and that can become
corrupted if the collation definition changes. It was thinking about
that that lead me to the idea of using something that can record
version dependences on *any* database object, which brought me to the
existing pg_depend table.

Concretely, eventually we might want to support checks etc, as
mentioned by Doug Doole and as I showed in an earlier version of this
POC patch, though I removed it from the more recent patch set so we
can focus on the more pressing problems. The check constraint idea
leads to more questions like: "does this constraint *really* use any
operators that truly depend on the collation definition?" (so CHECK
(name > 'xxx') depends on name's collation, but CHECK (LENGTH(name) <
32) doesn't really), and I didn't want to be distracted by that rabbit
hole. Here's the example message that came out of the earlier patch
for posterity:

WARNING: constraint "t_i_check" depends on collation 12018 version
"30.0.1", but the current version is "30.0.2"
DETAIL: The constraint may be corrupted due to changes in sort order.
HINT: Drop and recreate the constraint to avoid the risk of corruption.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-11-04 21:26:38 Re: [PATCH] Sort policies and triggers by table name in pg_dump.
Previous Message Thomas Munro 2019-11-04 20:58:51 Re: Collation versioning