Re: Collation versioning

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Douglas Doole <dougdoole(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2020-10-31 02:27:45
Message-ID: CA+hUKG+JrRdBVBvSZLvhAOnTcBF6CASj9BkUjyq9OqG2=BErrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 30, 2020 at 1:20 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> 4. I didn't really like the use of '' for unknown. I figured out how
> to use NULL for that.

Hrmph. That logic didn't work for the pattern ops case. I fixed
that, by partially reintroducing a special value, but this time
restricting the code that knows about that to just pg_dump, and I used
the value 'unknown', so really it's not special at all as far as the
server is concerned and there is only one kind of warning message.

Furthermore, I realised that I really don't like the policy of
assuming that all text-related indexes imported from older releases
need the "unknown" warning. That'll just make this feature
unnecessarily noisy and unpopular when 14 comes out, essentially
crying wolf, even though it's technically true that the collations in
imported-from-before-14 indexes are of unknown version. Even worse,
instructions might be shared around the internet to show how to shut
the warnings up without reindexing, and then later when there really
is a version change, someone might find those instructions and follow
them! So I propose that the default should be to assume that indexes
are not corrupted, unless you opt into the more pessimistic policy
with --index-collation-versions-unknown. Done like that.

I also realised that I don't like carrying a bunch of C code to
support binary upgrades, when it's really just a hand-coded trivial
UPDATE of pg_depend. Is there any reason pg_dump --binary-upgrade
shouldn't just dump UPDATE statements, and make this whole feature a
lot less mysterious, and shorter? Done like that.

While testing on another OS that will be encountered in the build farm
when I commit this, I realised that I needed to add --encoding=UTF8 to
tests under src/bin/pg_dump and src/test/locale, because they now do
things with ICU collations (if built with ICU support) and that only
works with UTF8. Another option would be to find a way to skip those
tests if the encoding is not UTF8. Hmm, I wonder if it's bad to
effectively remove the testing that comes for free from buildfarm
animals running this under non-UTF8 encodings; but if we actually
valued that, I suppose we'd do it explicitly as another test pass with
SQL_ASCII.

Attachment Content-Type Size
v34-0001-Remove-pg_collation.collversion.patch text/x-patch 27.7 KB
v34-0002-Add-pg_depend.refobjversion.patch text/x-patch 12.2 KB
v34-0003-Track-collation-versions-for-indexes.patch text/x-patch 112.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-10-31 03:30:13 Re: Add important info about ANALYZE after create Functional Index
Previous Message Justin Pryzby 2020-10-31 02:08:01 Re: bulk typos