Re: Collation versioning

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Douglas Doole <dougdoole(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2018-09-23 13:00:41
Message-ID: CAEepm=0rgoRY+uaCbpjo=Exd=D=jbJnv8BMTaSmy5D2DqQ532w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 19, 2018 at 1:16 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Thomas Munro (thomas(dot)munro(at)enterprisedb(dot)com) wrote:
> > On Wed, Sep 19, 2018 at 10:09 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > * Douglas Doole (dougdoole(at)gmail(dot)com) wrote:
> > > > > The CHECK constraint doesn't need to directly track that information-
> > > > > it should have a dependency on the column in the table and that's where
> > > > > the information would be recorded about the current collation version.
> > > >
> > > > Just to have fun throwing odd cases out, how would something like this be
> > > > recorded?
> > > >
> > > > Database default collation: en_US
> > > >
> > > > CREATE TABLE t (c1 TEXT, c2 TEXT, c3 TEXT,
> > > > CHECK (c1 COLLATE "fr_FR" BETWEEN c2 COLLATE "fr_FR" AND c3 COLLATE
> > > > "fr_FR"));
> > > >
> > > > You could even be really warped and apply multiple collations on a single
> > > > column in a single constraint.
> > >
> > > Once it gets to an expression and not just a simple check, I'd think
> > > we'd record it in the expression..
> >
> > Maybe I misunderstood, but I don't think it makes sense to have a
> > collation version "on the column in the table", because (1) that fails
> > to capture the fact that two CHECK constraints that were defined at
> > different times might have become dependent on two different versions
> > (you created one constraint before upgrading and the other after, now
> > the older one is invalidated and sounds the alarm but the second one
> > is fine), and (2) the table itself doesn't care about collation
> > versions since heap tables are unordered; there is no particular
> > operation on the table that would be the correct time to update the
> > collation version on a table/column. What we're trying to track is
> > when objects that in some way depend on the version become
> > invalidated, so wherever we store it there's going to have to be a
> > version recorded per dependent object at its creation time, so that's
> > either new columns on every interested catalog table, or ...
>
> Today, we work out what operators to use for a given column based on the
> data type. This is why I was trying to get at the idea of using typmod
> earlier, but if we can't make that work then I'm inclined to try and
> figure out a way to get it as close as possible to being associated with
> the type.
>
> Yes, the heap is unordered, but the type *is* ordered and we track that
> with the type system. Maybe we just extend that somehow, rather than
> using the typmod or adding columns into catalogs where we store type
> information (such as pg_attribute). Perhaps typmod could be made larger
> to be able to support this, that might be another approach.

Can you show how this would look in the catalogs, for Doug's example
above? There is no pg_attribute or similar that applies here.
Perhaps you want to bury the versions inside the serialised expression
tree? Or store it in a pair of arrays on the pg_constraint row?

> No where in this does it strike me that it makes sense to push this into
> pg_depend though.

In my scheme, the pg_depend row that already exists to record the
dependency between the pg_constraint row and the pg_collation row for
"fr_FR" holds a refobjversion value captured at constraint creation
time. Maybe that's a bit strange, but the alternatives I have thought
of so far seemed ad hoc and strange too.

Here's a Sunday afternoon rapid prototype of the concept, implemented
for indexes and check constraints. Examples of the output you get
when you access those database objects for the first time in each
backend:

WARNING: index "foo_i_idx" depends on collation 16385 version
"30.0.1", but the current version is "30.0.2"
DETAIL: The index may be corrupted due to changes in sort order.
HINT: REINDEX to avoid the risk of corruption.

WARNING: constraint "t_i_check" depends on collation 12018 version
"30.0.1", but the current version is "30.0.2"
DETAIL: The constraint may be corrupted due to changes in sort order.
HINT: Drop and recreate the constraint to avoid the risk of corruption.

I didn't try to tackle the default collation yet, so to try this you
need to use non-default collations.

Whether we'd actually want to do this for CHECK constraints or any of
this, I don't know. It's theoretically the right thing to do, but
most people probably don't want to be nagged to recreate (or recheck
in some new way) all their constraints every time they do an OS
upgrade, and in fact there is almost never anything actually wrong if
they don't... unless they're unlucky. But I wanted to do this to show
that it's a fairly general way to deal with dependencies between
database objects and collation versions using the existing links
between them in a relational sort of way. I guess you might want a
command or tool to find all such problems proactively and recheck,
reindex, repartition etc as required to make it shut up, and that
should be quite easy to drive from this design.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Remove-pg_collation.collversion.patch application/octet-stream 20.4 KB
0002-Add-pg_depend.refobjversion.patch application/octet-stream 10.5 KB
0003-Track-collation-versions-for-indexes.patch application/octet-stream 9.7 KB
0004-Track-collation-versions-for-CHECK-constraints.patch application/octet-stream 6.0 KB
0005-Use-glibc-version-in-lieu-of-collation-version-on-Li.patch application/octet-stream 4.1 KB
0006-Use-querylocale-to-get-collation-versions-on-FreeBSD.patch application/octet-stream 3.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-09-23 14:47:44 Re: Changing the setting of wal_sender_timeout per standby
Previous Message Keiichi Hirobe 2018-09-23 11:36:55 [patch]overallocate memory for curly braces in array_out