Re: Database-level collation version tracking

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Database-level collation version tracking
Date: 2022-02-07 16:08:14
Message-ID: 20220207160814.37g7i4gowikq2yjk@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 07, 2022 at 04:44:24PM +0100, Peter Eisentraut wrote:
> On 07.02.22 11:29, Julien Rouhaud wrote:
>
> > - there is no test
>
> Suggestions where to put it? We don't really have tests for the
> collation-level versioning either, do we?

There's so limited testing in collate.* regression tests, so I thought it would
be ok to add it there. At least some ALTER DATABASE ... REFRESH VERSION would
be good, similarly to collation-level versioning.

> > - it's missing some updates in create_database.sgml, and psql tab completion
> > for CREATE DATABASE with the new collation_version defelem.
>
> Added to create_database.sgml, but not to psql. We don't have completion
> for the collation option either, since it's only meant to be used by
> pg_upgrade, not interactively.

Ok.

>
> > - that's not really something new with this patch, but should we output the
> > collation version info or mismatch info in \l / \dO?
>
> It's a possibility. Perhaps there is a question of performance if we show
> it in \l and people have tons of databases and they have to make a locale
> call for each one. As you say, it's more an independent feature, but it's
> worth looking into.

Agreed.

> > +Oid
> > +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
> > +{
> > + Relation rel;
> > + Oid dboid;
> > + HeapTuple tup;
> > + Datum datum;
> > + bool isnull;
> > + char *oldversion;
> > + char *newversion;
> > +
> > + rel = table_open(DatabaseRelationId, RowExclusiveLock);
> > + dboid = get_database_oid(stmt->dbname, false);
> > +
> > + if (!pg_database_ownercheck(dboid, GetUserId()))
> > + aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
> > + stmt->dbname);
> > +
> > + tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(dboid));
> > + if (!HeapTupleIsValid(tup))
> > + elog(ERROR, "cache lookup failed for database %u", dboid);
> >
> > Is that ok to not obtain a lock on the database when refreshing the collation?
>
> That code takes a RowExclusiveLock on pg_database. Did you have something
> else in mind?

But that lock won't prevent a concurrent DROP DATABASE, so it's totally
expected to hit that cache lookup failed error. There should either be a
shdepLockAndCheckObject(), or changing the error message to some errmsg("data
xxx does not exist").

> > + /*
> > + * Check collation version. See similar code in
> > + * pg_newlocale_from_collation().
> > + */
> > + datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollversion,
> > + &isnull);
> > + if (!isnull)
> > + {
> >
> > This (and pg_newlocale_from_collation()) reports a problem if a recorded
> > collation version is found but there's no reported collation version.
> > Shouldn't it also complain if it's the opposite? It's otherwise a backdoor to
> > make sure there won't be any check about the version anymore, and while it can
> > probably happen if you mess with the catalogs it still doesn't look great.
>
> get_collation_actual_version() always returns either null or not null for a
> given installation. So the situation that the stored version is null and
> the actual version is not null can only happen as part of a software
> upgrade. In that case, all uses of collations after an upgrade would
> immediately start complaining about missing versions, which seems like a bad
> experience. Users can explicitly opt in to version tracking by running
> REFRESH VERSION once.

Ah right, I do remember that point which was also discussed in the collation
version tracking. Sorry about the noise.

> > + /*
> > + * template0 shouldn't have any collation-dependent objects, so unset
> > + * the collation version. This avoids warnings when making a new
> > + * database from it.
> > + */
> > + "UPDATE pg_database SET datcollversion = NULL WHERE datname = 'template0';\n\n",
> >
> > I'm not opposed, but shouldn't there indeed be a warning in case of discrepancy
> > in the source database (whether template or not)?
> >
> > # update pg_database set datcollversion = 'meh' where datname in ('postgres', 'template1');
> > UPDATE 2
> >
> > # create database test1 template postgres;
> > CREATE DATABASE
> >
> > # create database test2 template template1;
> > CREATE DATABASE
> >
> > # \c test2
> > WARNING: database "test2" has a collation version mismatch
>
> I don't understand what the complaint is here. It seems to work ok?

The comment says that you explicitly set a NULL collation version to avoid
warning when creating a database using template0 as a template, presumably
after a collation lib upgrade.

But as far as I can see the source database collation version is not checked
when creating a new database, so it seems to me that either the comment is
wrong, or we need another check for that. The latter seems preferable to me.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-07 16:11:53 Re: Make relfile tombstone files conditional on WAL level
Previous Message David G. Johnston 2022-02-07 16:04:57 Re: Storage for multiple variable-length attributes in a single row