Re: Database-level collation version tracking

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Database-level collation version tracking
Date: 2022-02-07 10:29:47
Message-ID: 20220207102947.6npnjyilcy4cgkxt@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 01, 2022 at 04:20:14PM +0100, Peter Eisentraut wrote:
> This patch adds to database objects the same version tracking that collation
> objects have.

This version conflicts with 87669de72c2 (Some cleanup for change of collate and
ctype fields to type text), so I'm attaching a simple rebase of your patch to
make the cfbot happy, no other changes.

> There is a new pg_database column datcollversion that stores
> the version, a new function pg_database_collation_actual_version() to get
> the version from the operating system, and a new subcommand ALTER DATABASE
> ... REFRESH COLLATION VERSION.
>
> This was not originally added together with pg_collation.collversion, since
> originally version tracking was only supported for ICU, and ICU on a
> database-level is not currently supported. But we now have version tracking
> for glibc (since PG13), FreeBSD (since PG14), and Windows (since PG13), so
> this is useful to have now. And of course ICU on database-level is being
> worked on at the moment as well.
> This patch is pretty much complete AFAICT.

Agreed. Here's a review of the patch:

- there should be a mention to the need for a catversion bump in the message
comment
- there is no test
- it's missing some updates in create_database.sgml, and psql tab completion
for CREATE DATABASE with the new collation_version defelem.

- that's not really something new with this patch, but should we output the
collation version info or mismatch info in \l / \dO?

+ if (!actual_versionstr)
+ ereport(ERROR,
+ (errmsg("database \"%s\" has no actual collation version, but a version was specified",
+ name)));-

this means you can't connect on such a database anymore. The level is probably
ok for collation version check, but for db isn't that too much?

+Oid
+AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
+{
+ Relation rel;
+ Oid dboid;
+ HeapTuple tup;
+ Datum datum;
+ bool isnull;
+ char *oldversion;
+ char *newversion;
+
+ rel = table_open(DatabaseRelationId, RowExclusiveLock);
+ dboid = get_database_oid(stmt->dbname, false);
+
+ if (!pg_database_ownercheck(dboid, GetUserId()))
+ aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
+ stmt->dbname);
+
+ tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(dboid));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for database %u", dboid);

Is that ok to not obtain a lock on the database when refreshing the collation?
I guess it's not worth bothering as it can only lead to spurious messages for
connection done concurrently, but there should be a comment to clarify it.
Also, it means that someone can drop the database concurrently. So it's
should be a "database does not exist" rather than a cache lookup failed error
message.

+ /*
+ * Check collation version. See similar code in
+ * pg_newlocale_from_collation().
+ */
+ datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollversion,
+ &isnull);
+ if (!isnull)
+ {

This (and pg_newlocale_from_collation()) reports a problem if a recorded
collation version is found but there's no reported collation version.
Shouldn't it also complain if it's the opposite? It's otherwise a backdoor to
make sure there won't be any check about the version anymore, and while it can
probably happen if you mess with the catalogs it still doesn't look great.

+ /*
+ * template0 shouldn't have any collation-dependent objects, so unset
+ * the collation version. This avoids warnings when making a new
+ * database from it.
+ */
+ "UPDATE pg_database SET datcollversion = NULL WHERE datname = 'template0';\n\n",

I'm not opposed, but shouldn't there indeed be a warning in case of discrepancy
in the source database (whether template or not)?

# update pg_database set datcollversion = 'meh' where datname in ('postgres', 'template1');
UPDATE 2

# create database test1 template postgres;
CREATE DATABASE

# create database test2 template template1;
CREATE DATABASE

# \c test2
WARNING: database "test2" has a collation version mismatch
DETAIL: The database was created using collation version meh, but the operating system provides version 2.34.
HINT: Rebuild all objects affected by collation in this database and run ALTER DATABASE test2 REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
You are now connected to database "test2" as user "rjuju".

The rest of the patch looks good to me. There's notably pg_dump and pg_upgrade
support so it indeed looks complete.

> One bonus thing would be to add
> a query to the ALTER DATABASE ref page similar to the one on the ALTER
> COLLATION ref page that identifies the objects that are affected by outdated
> collations. The database version of that might just show all
> collation-using objects or something like that. Suggestions welcome.

That would be nice, but that's something quite hard to do and the resulting
query would be somewhat unreadable.
Also, you need to look at custom data types, expression and quals at least, so
I'm not even sure that you can actually do it in pure SQL without additional
infrastructure.

> Also, one curious behavior is that if you get to a situation where the
> collation version is mismatched, every time an autovacuum worker launches
> you get the collation version mismatch warning in the log. Maybe that's
> actually correct, but maybe we want to dial it down a bit for
> non-interactive sessions.

I'm +0.5 to keep it that way. Index corruption is a real danger, so if you
have enough autovacuum worker launched to have a real problem with that, you
clearly should take care of the problem even faster.

Attachment Content-Type Size
v2-0001-Database-level-collation-version-tracking.patch text/plain 25.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2022-02-07 14:03:56 Re: Support escape sequence for cluster_name in postgres_fdw.application_name
Previous Message Frédéric Yhuel 2022-02-07 10:26:07 Allow parallel plan for referential integrity checks?