Re: [PATCH] Fix PITR pause bypass when initial XLOG_RUNNING_XACTS has subxid overflow

From: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
To: Jan Nidzwetzki <jnidzwetzki(at)gmx(dot)de>
Cc: Matt Blewitt <mble(at)planetscale(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Fix PITR pause bypass when initial XLOG_RUNNING_XACTS has subxid overflow
Date: 2026-06-12 22:30:33
Message-ID: CAN4CZFN48ew-BKNu_hAVaKtnLMu8if78k5Au-jQ6uJ=JEoHpPw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

> This is safe because replay is frozen at this
> point: the only ways out of the pause are promotion and shutdown, so no
> transaction's commit status can change afterwards, and any transaction a
> query finds committed in CLOG necessarily committed before that query's
> snapshot.

But if I look at the documentation, after shutdown it allows a restart
with a later recovery target:

> The intended use of the pause setting is to allow queries to be executed
> against the database to check if this recovery target is the most desirable
> point for recovery. The paused state can be resumed by using pg_wal_replay_resume()
> (see Table 9.81), which then causes recovery to end. If this recovery target is
> not the desired stopping point, then shut down the server, change the recovery
> target settings to a later target and restart to continue recovery.

"so no transaction's commit status can change after this point" is
true within the lifetime of the paused instance, but if I shut down
and restart the server with a later recovery target?

Even a read-only query can mark a tuple with HEAP_XMIN_INVALID if
HeapTupleSatisfiesMVCC decides that a transaction aborted or crashed.
And then in bufmgr.c:MarkSharedBufferDirtyHint, we can see the
following conditions that prevent this change from being flushed with
an early return:

if (XLogHintBitIsNeeded() && (lockstate & BM_PERMANENT))
{
/*
* If we must not write WAL, due to a relfilelocator-specific
* condition or being in recovery, don't dirty the page. We can
* set the hint, just not dirty the page as a result so the hint
* is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress() ||
RelFileLocatorSkippingWAL(BufTagGetRelFileLocator(&bufHdr->tag)))
return;
...

Where

#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())

So if we turn off both wal_log_hints and data checksums, that return
disappears, and we can cause data corruption with just a select in a
paused state with the patch.

See the attached tap test that showcases the problem.

Attachment Content-Type Size
subxid_corruption.pl application/octet-stream 7.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Baji Shaik 2026-06-12 22:34:59 Re: uuidv7 improperly accepts dates before 1970-01-01
Previous Message Andres Freund 2026-06-12 21:54:40 Re: Heads Up: cirrus-ci is shutting down June 1st