Re: Collation versioning

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Douglas Doole <dougdoole(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2020-11-02 06:59:10
Message-ID: CAOBaU_YHbuMbwDNNjCHcA6-qyRVeR1j6qFehA1gnz+bM3UZJNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 31, 2020 at 10:28 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Fri, Oct 30, 2020 at 1:20 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > 4. I didn't really like the use of '' for unknown. I figured out how
> > to use NULL for that.
>
> Hrmph. That logic didn't work for the pattern ops case. I fixed
> that, by partially reintroducing a special value, but this time
> restricting the code that knows about that to just pg_dump, and I used
> the value 'unknown', so really it's not special at all as far as the
> server is concerned and there is only one kind of warning message.

Ok, I'm fine with that.

> Furthermore, I realised that I really don't like the policy of
> assuming that all text-related indexes imported from older releases
> need the "unknown" warning. That'll just make this feature
> unnecessarily noisy and unpopular when 14 comes out, essentially
> crying wolf, even though it's technically true that the collations in
> imported-from-before-14 indexes are of unknown version. Even worse,
> instructions might be shared around the internet to show how to shut
> the warnings up without reindexing, and then later when there really
> is a version change, someone might find those instructions and follow
> them! So I propose that the default should be to assume that indexes
> are not corrupted, unless you opt into the more pessimistic policy
> with --index-collation-versions-unknown. Done like that.

Yes, I was also worried about spamming this kind of messages after an
upgrade. Note that this was initially planned for REL_13_STABLE,
whose release date was very close to glibc 2.28, so at that time this
would have been more likely to have a real corruption on the indexes.
I'm fine with the new behavior.

> I also realised that I don't like carrying a bunch of C code to
> support binary upgrades, when it's really just a hand-coded trivial
> UPDATE of pg_depend. Is there any reason pg_dump --binary-upgrade
> shouldn't just dump UPDATE statements, and make this whole feature a
> lot less mysterious, and shorter? Done like that.

I just thought that it wouldn't be acceptable to do plain DML on the
catalogs. If that's ok, then definitely this approach is better.

> While testing on another OS that will be encountered in the build farm
> when I commit this, I realised that I needed to add --encoding=UTF8 to
> tests under src/bin/pg_dump and src/test/locale, because they now do
> things with ICU collations (if built with ICU support) and that only
> works with UTF8.

Oh I didn't know that.

> Another option would be to find a way to skip those
> tests if the encoding is not UTF8. Hmm, I wonder if it's bad to
> effectively remove the testing that comes for free from buildfarm
> animals running this under non-UTF8 encodings; but if we actually
> valued that, I suppose we'd do it explicitly as another test pass with
> SQL_ASCII.

I guess it would be better to keep checking non-UTF8 encodings, but
the current approach looks quite random and I don't have any better
suggestions.

Note that v34 now fails when run on a without that don't have
defined(__GLIBC__) (e.g. macos). The failure are in
collate.icu.utf8.sql, of the form:

- icuidx06_d_en_fr_ga | "default" | up to date
+ icuidx06_d_en_fr_ga | "default" | version not tracked

Given the new approach, the only option I can see is to simply remove
any attempt to cover the default collation in the tests. An
alternative file would be a pain to maintain and it wouldn't bring any
value apart from checking that the default collation is either always
tracked or never, but not a mix of those.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-11-02 07:00:06 reindex partitioned indexes: refactor ReindexRelationConcurrently ?
Previous Message Luc Vlaming 2020-11-02 06:53:45 Re: should INSERT SELECT use a BulkInsertState?