Re: Collation versioning

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Douglas Doole <dougdoole(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2018-09-18 12:48:35
Message-ID: 20180918124835.GZ4184@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Thomas Munro (thomas(dot)munro(at)enterprisedb(dot)com) wrote:
> On Mon, Sep 17, 2018 at 9:02 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Thomas Munro (thomas(dot)munro(at)enterprisedb(dot)com) wrote:
> > > On Mon, Sep 17, 2018 at 6:13 AM Douglas Doole <dougdoole(at)gmail(dot)com> wrote:
> > > > On Sun, Sep 16, 2018 at 1:20 AM Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > > >> 3. Fix the tracking of when reindexes need to be rebuilt, so that you
> > > >> can't get it wrong (as you're alluding to above).
> > > >
> > > > I've mentioned this in the past, but didn't seem to get any traction, so I'll try it again ;-)
> > >
> > > Probably because we agree with you, but don't have all the answers :-)
> >
> > Agreed.
> >
> > > > The focus on indexes when a collation changes is, in my opinion, the least of the problems. You definitely have to worry about indexes, but they can be easily rebuilt. What about other places where collation is hardened into the system, such as constraints?
> > >
> > > We have to start somewhere and indexes are the first thing that people
> > > notice, and are much likely to actually be a problem (personally I've
> > > encountered many cases of index corruption due to collation changes in
> > > the wild, but never a constraint corruption, though I fully understand
> > > the theoretical concern). Several of us have observed specifically
> > > that the same problems apply to CHECK constraints and PARTITION
> > > boundaries, and there may be other things like that. You could
> > > imagine tracking collation dependencies on those, requiring a RECHECK
> > > or REPARTITION operation to update them after a depended-on collation
> > > version changes.
> > >
> > > Perhaps that suggests that there should be a more general way to store
> > > collation dependencies -- something more like pg_depend, rather than
> > > bolting something like indcollversion onto indexes and every other
> > > kind of catalog that might need it. I don't know.
> >
> > Agreed. If we start thinking about pg_depend then maybe we realize
> > that this all comes back to pg_attribute as the holder of the
> > column-level information and maybe what we should be thinking about is a
> > way to encode version information into the typmod for text-based
> > types...
>
> So to be more concrete: pg_depend could have a new column
> "refobjversion". Whenever indexes are created or rebuilt, we'd
> capture the current version string in the pg_depend rows that link
> index attributes and collations. Then we'd compare those against the
> current value when we first open an index and complain if they don't
> match. (In this model there would be no "collversion" column in the
> pg_collation catalog.)

I'm really not sure why you're pushing to have this in pg_depend..

> That'd leave a place for other kinds of database objects (CHECKs,
> PARTITIONS, ...) to store their version dependency, if someone later
> wants to add support for that.

Isn't what matters here where the data's stored, as in, in a column..?

All of those would already have dependencies on the column so that they
can be tracked back there.

> I'm not sure if my idea about updating the default collation row in
> newly created databases has legs though. Any thoughts on that?

My initial reaction is that we should have a version included basically
everywhere and then let users decide how they want to change it. For a
new cluster, I'd agree with using the latest available (while allowing
it to be chosen if a user wishes for something else) but I'm not sure
I'd go farther than that.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-09-18 13:42:21 Re: [HACKERS] Bug in to_timestamp().
Previous Message Yugo Nagata 2018-09-18 12:34:42 Re: Unused argument from execute_sql_string()