16: Collation versioning and dependency helpers

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: 16: Collation versioning and dependency helpers
Date: 2022-10-30 04:41:16
Message-ID: 51fb77507cafd43fc1a2e733c23045873d93ae60.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Motivation:

We haven't fully solved the changing collation-provider problem. An
upgrade of the OS may change the version of libc or icu, and that might
affect the collation, which could leave you with various corrupt
database objects including:

* indexes
* constraints
* range types or multiranges (or other types dependent
on collation for internal consistency)
* materialized views
* partitioned tables (range or hash)

There's discussion about trying to reliably detect these changes and
remedy them. But there are major challenges; for instance, glibc
doesn't give a reliable signal that a collation may have changed, which
would leave us with a lot of false positives and create a new set of
problems (e.g. reindexing when it's unnecessary). And even with ICU, we
don't have a way to support multiple versions of a provider or of a
single collation, so trying to upgrade would still be a hassle.

Proposal:

Add in some tools to make it easier for administrators to find out if
they are at risk and solve the problem for themselves in a systematic
way.

Patches:

0001: Treat "default" collation as unpinned, so that entries in
pg_depend are created. The rationale is that, since the "default"
collation can change, it's not really an immutable system object, and
it's worth tracking which objects are affected by it. It seems to bloat
pg_depend by about 5-10% though -- that doesn't seem great, but I'm not
sure if it's a real problem or not.

0002: Enable pg_collation_actual_version() to work on the default
collation (oid=100) so that it doesn't need to be treated as a special
case.

0003: Fix ALTER COLLATION "default" REFRESH VERSION, which currently
throws an unhelpful internal error. Instead, issue a more helpful error
that suggests "ALTER DATABASE ... REFRESH COLLATION VERSION" instead.

0004: Add system views:
pg_collation_versions: quickly see the current (from the catalog)
and actual (from the provider) versions of each collation
pg_collation_dependencies: map of objects to the collations they
depend on

Along with these patches, you can use some tricks to verify data, such
as /contrib/amcheck; or fix the data with things like:

* REINDEX
* VACUUM FULL/TRUNCATE/CLUSTER
* REFRESH MATERIALIZED VIEW

And then refresh the collation version when you're confident that your
data is valid.

TODO:

* The dependencies view is not rigorously complete, because the
directed dependency graph doesn't quite establish an "affected by"
relationship. One exception is that a composite type doesn't depend on
its associated relation, so a composite type over a range type doesn't
depend on the range type.
* Consider adding in some verification helpers that can verify that a
value is still valid (e.g. a range type that depends on a collation
might have corrupt values). We could have a collation verifier for
types that are collation-dependenent, or perhaps just go through the
input and output functions and catch any errors.
* Consider better tracking of which collation versions were active on
a particular object since the last REINDEX (or REFRESH MATERIALIZED
VIEW, TRUNCATE, or other command that would remove any trace of data
affected by the previous collation version).

Regards,
Jeff Davis

Attachment Content-Type Size
v1-0001-Count-the-default-collation-as-an-unpinned-dependenc.patch text/x-patch 15.3 KB
v1-0002-Enable-pg_collation_actual_version-to-work-on-the-de.patch text/x-patch 3.4 KB
v1-0003-Fix-ALTER-COLLATION-default-REFRESH-VERSION.patch text/x-patch 1.1 KB
v1-0004-Add-views-pg_collation_versions-and-pg_collation_dep.patch text/x-patch 10.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-10-30 06:10:41 Re: 16: Collation versioning and dependency helpers
Previous Message Noah Misch 2022-10-30 03:16:39 Re: has_wal_read_bug