Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jeremy Finzel <finzelj(at)gmail(dot)com>
Subject: Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Date: 2018-06-12 00:33:05
Message-ID: 20180612003305.74vnsl4scn5sdhhf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi,

On 2018-05-29 19:14:51 -0400, Alvaro Herrera wrote:
> I added an Assert(DatabasePath != NULL) to
> RelationCacheInitFilePreInvalidate() and didn't see a single crash when
> running the tests. I thought that adding a "VACUUM FREEZE pg_class"
> in the recovery tests (where there is a standby) ought to do it, but it
> doesn't. What's the deal there?
>
> Here are some proposed changes. Some of these comment edits are WIP :-)

I'm a bit confused by these changes - there seems to be some that look
like a borked diff? And a number of others look pretty unrelated?

> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

> diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
> index 0d6100fb08..8a8620943f 100644
> --- a/src/backend/utils/cache/inval.c
> +++ b/src/backend/utils/cache/inval.c
> @@ -872,6 +872,8 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
> int nmsgs, bool RelcacheInitFileInval,
> Oid dbid, Oid tsid)
> {
> + Assert(InRecovery);
> +

Idk, seems unrelated.

> if (nmsgs <= 0)
> return;
>
> @@ -884,12 +886,13 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
> dbid);
>
> /*
> - * RelationCacheInitFilePreInvalidate, when the invalidation message
> - * is for a specific database, requires DatabasePath to be set, but we
> - * should not use SetDatabasePath during recovery, since it is
> - * intended to be used only once by normal backends. Hence, a quick
> - * hack: set DatabasePath directly then unset after use.
> + * When the invalidation message is for a specific database,
> + * RelationCacheInitFilePreInvalidate requires DatabasePath to be set,
> + * but we're not allowed to use SetDatabasePath during recovery (since
> + * it is intended to be used by normal backends). Hence, a quick hack:
> + * set DatabasePath directly then unset after use.
> */
> + Assert(!DatabasePath); /* don't clobber an existing value */
> if (OidIsValid(dbid))
> DatabasePath = GetDatabasePath(dbid, tsid);

Same.

> diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
> index aa4427724d..71b2212cbd 100644
> --- a/src/backend/utils/cache/relcache.c
> +++ b/src/backend/utils/cache/relcache.c
> @@ -1934,6 +1934,11 @@ RelationIdGetRelation(Oid relationId)
> RelationClearRelation(rd, true);
>
> /*
> + * Normal entries are valid by now -- except nailed ones when
> + * loaded before relcache initialization. There isn't enough
> + * infrastructure yet to do pg_class lookups, but we need their
> + * rd_rel entries to be updated, so we let these through.
> + */
> * Normally entries need to be valid here, but before the relcache
> * has been initialized, not enough infrastructure exists to
> * perform pg_class lookups. The structure of such entries doesn't
> @@ -2346,8 +2351,7 @@ RelationClearRelation(Relation relation, bool rebuild)
> RelationCloseSmgr(relation);

This sure looks like it's a syntax error? So I guess you might not have
staged the removal ports of the diff?

> /*
> - * Treat nailed-in system relations separately, they always need to be
> - * accessible, so we can't blow them away.
> + * We cannot blow away nailed-in relations, so treat them especially.
> */

The former seems just as accurate, and is basically just the already
existing comment?

> if (relation->rd_isnailed)
> {
> @@ -5942,7 +5946,8 @@ write_relcache_init_file(bool shared)
> * wrote out was up-to-date.)
> *
> * This mustn't run concurrently with the code that unlinks an init file
> - * and sends SI messages, so grab a serialization lock for the duration.
> + * and sends SI messages (see RelationCacheInitFilePreInvalidate), so grab
> + * a serialization lock for the duration.
> */
> LWLockAcquire(RelCacheInitLock, LW_EXCLUSIVE);

Unrelated?

> @@ -6061,6 +6066,10 @@ RelationHasUnloggedIndex(Relation rel)
> * changed one or more of the relation cache entries that are kept in the
> * local init file.
> *
> + * When DatabasePath is set, both the init file for that database and the
> + * shared (global) init files are to be removed; otherwise only the latter is.
> + * This is useful during recovery (XXX really?)
> + *

I'm confused?

> * To be safe against concurrent inspection or rewriting of the init file,
> * we must take RelCacheInitLock, then remove the old init file, then send
> * the SI messages that include relcache inval for such relations, and then
> @@ -6180,9 +6189,9 @@ unlink_initfile(const char *initfilename, int elevel)
> {
> if (unlink(initfilename) < 0)
> {
> - /* It might not be there, but log any error other than ENOENT */
> + /* It might not be there, but report any error other than ENOENT */
> if (errno != ENOENT)
> - ereport(ERROR,
> + ereport(elevel,
> (errcode_for_file_access(),
> errmsg("could not remove cache file \"%s\": %m",
> initfilename)));

Included the elevel inclusion. Can't parse the difference between log
and report here?

Greetings,

Andres Freund

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2018-06-12 00:39:14 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Previous Message Merlin Moncure 2018-06-11 21:58:12 Re: Multiple PostgreSQL instances on one machine

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-06-12 00:39:14 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Previous Message Andres Freund 2018-06-11 23:16:34 Re: [COMMITTERS] pgsql: Logical replication