| From: | Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
|---|---|
| To: | Jan Nidzwetzki <jnidzwetzki(at)gmx(dot)de> |
| Cc: | Matt Blewitt <mble(at)planetscale(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: [PATCH] Fix PITR pause bypass when initial XLOG_RUNNING_XACTS has subxid overflow |
| Date: | 2026-06-12 22:30:33 |
| Message-ID: | CAN4CZFN48ew-BKNu_hAVaKtnLMu8if78k5Au-jQ6uJ=JEoHpPw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello
> This is safe because replay is frozen at this
> point: the only ways out of the pause are promotion and shutdown, so no
> transaction's commit status can change afterwards, and any transaction a
> query finds committed in CLOG necessarily committed before that query's
> snapshot.
But if I look at the documentation, after shutdown it allows a restart
with a later recovery target:
> The intended use of the pause setting is to allow queries to be executed
> against the database to check if this recovery target is the most desirable
> point for recovery. The paused state can be resumed by using pg_wal_replay_resume()
> (see Table 9.81), which then causes recovery to end. If this recovery target is
> not the desired stopping point, then shut down the server, change the recovery
> target settings to a later target and restart to continue recovery.
"so no transaction's commit status can change after this point" is
true within the lifetime of the paused instance, but if I shut down
and restart the server with a later recovery target?
Even a read-only query can mark a tuple with HEAP_XMIN_INVALID if
HeapTupleSatisfiesMVCC decides that a transaction aborted or crashed.
And then in bufmgr.c:MarkSharedBufferDirtyHint, we can see the
following conditions that prevent this change from being flushed with
an early return:
if (XLogHintBitIsNeeded() && (lockstate & BM_PERMANENT))
{
/*
* If we must not write WAL, due to a relfilelocator-specific
* condition or being in recovery, don't dirty the page. We can
* set the hint, just not dirty the page as a result so the hint
* is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress() ||
RelFileLocatorSkippingWAL(BufTagGetRelFileLocator(&bufHdr->tag)))
return;
...
Where
#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
So if we turn off both wal_log_hints and data checksums, that return
disappears, and we can cause data corruption with just a select in a
paused state with the patch.
See the attached tap test that showcases the problem.
| Attachment | Content-Type | Size |
|---|---|---|
| subxid_corruption.pl | application/octet-stream | 7.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Baji Shaik | 2026-06-12 22:34:59 | Re: uuidv7 improperly accepts dates before 1970-01-01 |
| Previous Message | Andres Freund | 2026-06-12 21:54:40 | Re: Heads Up: cirrus-ci is shutting down June 1st |