Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Catalog invalidations vs catalog scans vs ScanPgRelation()
Date: 2020-04-09 20:56:03
Message-ID: CA+TgmoYaFYRRdRZ94p_Qdt+1oONg6sMOvbkGHKVsFtONCrFkhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[ belatedly responding ]

On Sat, Feb 29, 2020 at 3:17 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> My preliminary conclusion is that it's simply not safe to do
> SnapshotResetXmin() from within InvalidateCatalogSnapshot(),
> PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need
> to defer the SnapshotResetXmin() call until at least
> CommitTransactionCommand()? Outside of that there ought (with exception
> of multi-transaction commands, but they have to be careful anyway) to be
> no "in progress" sequences of related catalog lookups/modifications.
>
> Alternatively we could ensure that all catalog lookup/mod sequences
> ensure that the first catalog snapshot is registered. But that seems
> like a gargantuan task?

If I understand correctly, the scenario you're concerned about is
something like this:

(1) Transaction #1 reads a catalog tuple and immediately releases its snapshot.
(2) Transaction #2 performs a DELETE or UPDATE on that catalog tuple.
(3) Transaction #3 completes a VACUUM on the table, so that the old
tuple is pruned, thus marked dead, and then the TID is marked unused.
(4) Transaction #4 performs an INSERT which reuses the same TID.
(5) Transaction #1 now performs a DELETE or UPDATE using the previous
TID and updates the unrelated tuple which reused the TID rather than
the intended tuple.

It seems to me that what is supposed to prevent this from happening is
that you aren't supposed to release your snapshot at the end of step
#1. You're supposed to hold onto it until after step #5 is complete. I
think that there are fair number of places that are already careful
about that. I just picked a random source file that I knew Tom had
written and found this bit in extension_config_remove:

extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
NULL, 1, key);

extTup = systable_getnext(extScan);
...a lot more stuff...
CatalogTupleUpdate(extRel, &extTup->t_self, extTup);

systable_endscan(extScan);

Quite apart from this issue, there's a very good reason why it's like
that: extTup might be pointing right into a disk buffer, and if we did
systable_endscan() before the last access to it, our pointer could
become invalid. A fair number of places are protected due to the scan
being kept open like this, but it looks like most of the ones that use
SearchSysCacheCopyX + CatalogTupleUpdate are problematic.

I would be inclined to fix this problem by adjusting those places to
keep a snapshot open rather than by making some arbitrary rule about
holding onto a catalog snapshot until the end of the command. That
seems like a fairly magical coding rule that will happen to work in
most practical cases but isn't really a principled approach to the
problem. Besides being magical, it's also fragile: just deciding to
use a some other snapshot instead of the catalog snapshot causes your
code to be subtly broken in a way you're surely not going to expect.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-04-09 21:49:47 Re: pgsql: Rationalize GetWalRcv{Write,Flush}RecPtr().
Previous Message Justin Pryzby 2020-04-09 20:39:49 Re: debian bugrept involving fast default crash in pg11.7