Fix race condition in SSI when reading PredXact->SxactGlobalXmin

From: Josh Curtis <jcurtis825(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Fix race condition in SSI when reading PredXact->SxactGlobalXmin
Date: 2025-09-04 22:09:57
Message-ID: CAEkjoh2X8yL9TKrQKx6SL6gtMNuZsyH8K-+rE78EwdOv4oXjSA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

DropAllPredicateLocksFromTable, PredicateLockPageSplit,
and CheckTableForSerializableConflictIn all assume that if
PredXact->SxactGlobalXmin is invalid there are no active serializable
transactions and that they can return early as there's no work to do. They
don't acquire a LWLock on SerializableXactHashLock because they already
have locks on the relevant buffer page or relation, which should protect
them from conflicts with new concurrent transactions. However, this
justification doesn't protect against concurrent updates of SxactGlobalXmin
which SetNewSxactGlobalXmin can do.

SetNewSxactGlobalXmin sets SxactGlobalXmin to InvalidTransactionId then
iterates over the active serializable transactions searching for a new
xmin.

static void
SetNewSxactGlobalXmin(void)
{
...
PredXact->SxactGlobalXmin = InvalidTransactionId;
PredXact->SxactGlobalXminCount = 0;

dlist_foreach(iter, &PredXact->activeList)
{
// find new xmin
}
...
}

If SetNewSxactGlobalXmin is interrupted before a new xmin has been set,
processes that read SxactGlobalXmin during this time will see
InvalidTransactionId and might incorrectly conclude there are no concurrent
serializable transactions. In the case of PredicateLockPageSplit, it will
return early and fail to transfer over information about SIRead locks on
the old page as a result.

I have a small repo that demonstrates serialization anomalies when
PredicateLockPageSplit is incorrectly skipped (they will show up much more
frequently with a small sleep after SetNewSxactGlobalXmin sets
SxactGlobalXmin to invalid, but they still show up eventually on normal
builds). [1]

I see a couple of options to fix this

1. Guard the reads with a lock. I've attached a patch to do this.

2. Update SetNewSxactGlobalXmin so concurrent lockless reads
of SxactGlobalXmin will never see an invalid transaction ID unless there
genuinely are no more active serializable transactions. To do this, delay
assigning PredXact->SxactGlobalXmin until all active transactions have been
examined.

static void
SetNewSxactGlobalXmin(void)
{
...
Assert(TransactionIdIsValid(PredXact->SxactGlobalXmin))
TransactionId xmin = InvalidTransactionId;
....
dlist_foreach(iter, &PredXact->activeList)
{
// iterate over transactions and find new xmin
}
PredXact->SxactGlobalXmin = xmin;
....
}

This is definitely a bit more complex. It requires that
SetNewSxactGlobalXmin is never called when SxactGlobalXmin is invalid to
prevent readers from seeing an invalid transaction ID when they should see
a valid one -- I think this is the case now since before
SetNewSxactGlobalXmin is called postgres checks that
PredXact->SxactGlobalXminCount > 0. I assume this entails that
SxactGlobalXmin is valid, but I have not checked every place the two
variables are modified.

I imagine the vast majority of postgres instances aren't using serializable
isolation, so I figured I'd throw this out as a possible optimization. I'm
not sure how the project trades off between simplicity and (very minor?)
performance optimizations, so I'll defer to the community.

Best,
Josh Curtis

[1] https://github.com/joshcurtis/ssi_race_condition_reproduction

Attachment Content-Type Size
v1-0001-Fix-race-condition-when-reading-PredXact-SxactGlo.patch application/octet-stream 3.4 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-09-04 22:30:33 Re: Logical Replication of sequences
Previous Message Tom Lane 2025-09-04 21:52:49 Re: [PATCH] Fix ALTER SYSTEM empty string bug for GUC_LIST_QUOTE parameters