Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
Date: 2023-03-18 10:33:29
Message-ID: CAFBsxsEJCi4nuAy_HFVkhLWGOqYbdMuDEOiQ274XCuDWKxLzNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for picking up this badly-needed topic again! I was irresponsible
last year and let it fall off my radar, but I'm looking at the patches, as
well as revisiting discussions from the last four (!?) years that didn't
lead to action.

0001:

+ In this condition the system can still execute read-only transactions.
+ The active transactions will continue to execute and will be able to
+ commit.

This is ambiguous. I'd first say that any transactions already started can
continue, and then say that only new read-only transactions can be started.

0004:

-HINT: Stop the postmaster and vacuum that database in single-user mode.
+HINT: VACUUM or VACUUM FREEZE that database.

VACUUM FREEZE is worse and should not be mentioned, since it does
unnecessary work. Emergency vacuum is not school -- you don't get extra
credit for doing unnecessary work.

Also, we may consider adding a boxed NOTE warning specifically against
single-user mode, especially if this recommendation will change in at least
some minor releases so people may not hear about it. See also [1].

- * If we're past xidStopLimit, refuse to execute transactions, unless
- * we are running in single-user mode (which gives an escape hatch
- * to the DBA who somehow got past the earlier defenses).
+ * If we're past xidStopLimit, refuse to allocate new XIDs.

This patch doesn't completely get rid of the need for single-user mode, so
it should keep all information about it. If a DBA wanted to e.g. drop or
truncate a table to save vacuum time, it is still possible to do it in
single-user mode, so the escape hatch is still useful.

In swapping this topic back in my head, I also saw [2] where Robert
suggested

"that old prepared transactions and stale replication
slots should be emphasized more prominently. Maybe something like:

HINT: Commit or roll back old prepared transactions, drop stale
replication slots, or kill long-running sessions.
Ensure that autovacuum is progressing, or run a manual database-wide
VACUUM."

That sounds like a good direction to me. There is more we could do here to
make the message more specific [3][4][5], but the patches here are in the
right direction.

Note for possible backpatching: It seems straightforward to go back to
PG14, which has the failsafe, but we should have better testing in place
first. There is a patch in this CF to make it easier to get close to
wraparound, so I'll look at what it does as well.

[1]
https://www.postgresql.org/message-id/CA%2BTgmoadjx%2Br8_gGbbnNifL6vEyjZntiQRPzyixrUihvtZ5jdQ%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CA+Tgmob1QCMJrHwRBK8HZtGsr+6cJANRQw2mEgJ9e=D+z7cOsw@mail.gmail.com
[3]
https://www.postgresql.org/message-id/20190504023015.5mgpbl27tld4irw5%40alap3.anarazel.de
[4]
https://www.postgresql.org/message-id/20220204013539.qdegpqzvayq3d4y2%40alap3.anarazel.de
[5]
https://www.postgresql.org/message-id/20220220045757.GA3733812%40rfd.leadboat.com

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-03-18 12:06:38 Re: Initial Schema Sync for Logical Replication
Previous Message Amit Kapila 2023-03-18 10:11:09 Re: Allow logical replication to copy tables in binary format