Re: Collation versioning

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Douglas Doole <dougdoole(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2020-02-05 16:17:25
Message-ID: CAOBaU_ZvdNzXZkYUNcm-rTWJnAdrTtTpZPfQEe1rjm_AQ66VLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 29, 2020 at 9:58 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> On Tue, Jan 28, 2020 at 1:06 PM Peter Eisentraut
> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> > 0001-0003 seem pretty much OK. Why is pg_depend.refobjversion of type
> > "name" whereas the previous pg_collation.collversion was type "text"?
> > Related to that, we previously used null to indicate an unknown
> > collation version, and now it's an empty string.
>
> That's what Thomas implemented when he started to work on it and I
> simply kept it that way until now. I'm assuming that it was simply to
> avoid wasting time on the varlena stuff while working on the
> prototype, so barring any objections I'll change to nullable text
> column in the next revision.

Done

> > The documentation snippet for this patch talks about upgrades from PG12
> > to later. But what about upgrades on platforms where we currently don't
> > have collation versioning but might introduce it later (FreeBSD,
> > Windows)? This needs to be generalized.
>
> Good point, I'll try to improve that.

Done.

> > I think this whole thing needs more tests. We are designing this
> > delicate mechanism but have no real tests for it. We at least need to
> > come up with a framework for how to test this automatically, so that we
> > can add more test cases over time as people will invariably report
> > issues when this hits the real world.
>
> Indeed. I have some unlikely index test cases I'm for now using to
> validate the behavior, but didn't start a real test infrastructure.
> I'll also work on that for the next revision, although I'll need some
> more thinking on how to properly test the pg_upgrade part.

So I added all tests I could think of to validate the correct behavior
of all the new stuff. Mostly:

- tests to make sure that we properly track a collation version for
various case of collation hidden somewhere in index definitions
- tests for pg_dump in binary upgrade mode to make sure that the
collation version (or the lack of known version) is correctly
preserved. I also modified pg_dump TAP tests to restore the binary
dump on an instance in binary mode, redump it and rerun the related
testsuite. While doing that I also realized that the previous support
for unknown version for partly broken, as it's possible to end up with
a database where only part of the collation versions are none. I
fixed it, adding a new "DEPEND ON COLLATION x UNKNOWN VERSION"
alternative for that case, with proper pg_dump support and required
tests
- new pg_dump TAP test mode with both --binary-upgrade and the new
--unknown-collations-binary-compatible switch, to make sure that we
don't dump the UNKNOWN VERSION orders when the
--collation-binary-compatible pg_upgrade option is used
- test for the ALTER INDEX name DEPENDS ON COLLATION name CURRENT VERSION

To avoid too many platform dependent behavior, I restricted the tests
for ICU collation only, with the required changes to ignore the tests
when postgres isn't compile with ICU support. One exception is a
couple of test to validate that we correctly add dependencies for
default collation, as we for now only support libc default collation.
It means that as written, the collate.icu.utf8 test will need an
alternative output for glibc platforms (which I didn't added yet, as
I'm sure there'll be other changes required, so let's avoid the pain
of maintaining it for now), as I've been testing the expected file
against macos.

Note that I didn't change any syntax (or switched to native functions
for the binary pg_dump) as it's still not clear to me what exactly
should be implemented.

Attachment Content-Type Size
0001-Remove-pg_collation.collversion-v7.patch application/octet-stream 23.4 KB
0003-Implement-type-regcollation-v7.patch application/octet-stream 8.9 KB
0002-Add-pg_depend.refobjversion-v7.patch application/octet-stream 11.6 KB
0005-Preserve-index-dependencies-on-collation-during-pg_u-v7.patch application/octet-stream 40.9 KB
0004-Track-collation-versions-for-indexes-v7.patch application/octet-stream 50.9 KB
0006-Add-a-new-ALTER-INDEX-name-DEPENDS-ON-COLLATION-name-v7.patch application/octet-stream 23.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-02-05 16:31:38 Re: Add %x to PROMPT1 and PROMPT2
Previous Message Magnus Hagander 2020-02-05 16:16:14 Re: Add %x to PROMPT1 and PROMPT2