|From:||Andres Freund <andres(at)anarazel(dot)de>|
|To:||pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>|
|Subject:||Catalog invalidations vs catalog scans vs ScanPgRelation()|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
While self reviewing a patch I'm about to send I changed the assertion
in index_getnext_tid from
to instead test (via an indirection)
Without ->xmin being set, it's not safe to do scans. And
checking RecentGlobalXmin doesn't really do much.
But, uh. That doesn't get very far through the regression tests.
Turns out that I am to blame for that. All the way back in 9.4. For
logical decoding I needed to make ScanPgRelation() use a specific type
of snapshot during one corner case of logical decoding. For reasons lost
to time, I didn't continue to pass NULL to systable_beginscan() in the
plain case, but did an explicit GetCatalogSnapshot(RelationRelationId).
Note the missing RegisterSnapshot()...
That's bad because:
a) If invalidation processing triggers a InvalidateCatalogSnapshot(),
the contained SnapshotResetXmin() may find no other snapshot, and
reset ->xmin. Which then may cause relevant row versions to be
b) If there's a subsequent GetCatalogSnapshot() during invalidation
processing, that will GetSnapshotData() into the snapshot currently
The fix itself is trivial, just pass NULL for the normal case, rather
than doing GetCatalogSnapshot().
But I think this points to some severe holes in relevant assertions /
1) Asserting that RecentGlobalXmin is set - like many routines do -
isn't meaningful, because it stays set even if SnapshotResetXmin()
releases the transaction's snapshot. These are fairly old assertions
(d53a56687f3d). As far as I can tell these routines really should
verify that a snapshot is set.
2) I think we need to reset TransactionXmin, RecentXmin whenever
SnapshotResetXmin() clears xmin. While we'll set them again the next
time a snapshot is acquired, the fact that they stay set seems likely
to hide bugs.
We also could remove TransactionXmin and instead use the
pgproc/pgxact's ->xmin. I don't really see the point of having it?
3) Similarly, I think we ought to reset reset RecentGlobal[Data]Xmin at
the end of the transaction or such.
But I'm not clear what protects those values from being affected by
wraparound in a longrunning transaction? Initially they are obviously
protected against that due to the procarray - but once the oldest
procarray entry releases its snapshot, the global xmin horizon can
advance. That allows transactions that up to ~2 billion into the
future of the current backend, whereas RecentGlobalXmin might be
nearly ~2 billion transactions in the past relative to ->xmin.
That might not have been a likely problem many years ago, but seems
far from impossible today?
I propose to commit a fix, but then also add an assertion for
TransactionIdIsValid(MyPgXact->xmin) instead (or in addition) to the
TransactionIdIsValid(RecentGlobalXmin) tests right now. And in master
clear the various *Xmin variables whenever we reset xmin.
I think in master we should also start to make RecentGlobalXmin etc
FullTransactionIds. We can then convert the 32bit xids we compare with
RecentGlobal* to 64bit xids (which is safe, because live xids have to be
within [oldestXid, nextXid)). I have that as part of another patch
|Next Message||Pavel Stehule||2020-02-29 05:43:54||proposal \gcsv|
|Previous Message||David Zhang||2020-02-29 05:23:49||Re: Making psql error out on output failures|