Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
Date: 2023-03-31 07:38:55
Message-ID: CAFBsxsH2DEB67gnrh272EF7pJonc29b8i8yRM7W3XqpyL6UpKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 21, 2023 at 6:44 PM Aleksander Alekseev <
aleksander(at)timescale(dot)com> wrote:

Okay, the changes look good. To go further, I think we need to combine into
two patches, one with 0001-0003 and one with 0004:

1. Correct false statements about "shutdown" etc. This should contain
changes that can safely be patched all the way to v11.
2. Change bad advice (single-user mode) into good advice. We can target
head first, and then try to adopt as far back as we safely can (and test).

(...and future work, so not part of the CF here) 3. Tell the user what
caused the problem, instead of saying "go figure it out yourself".

> > In swapping this topic back in my head, I also saw [2] where Robert
suggested
> >
> > "that old prepared transactions and stale replication
> > slots should be emphasized more prominently. Maybe something like:
> >
> > HINT: Commit or roll back old prepared transactions, drop stale
> > replication slots, or kill long-running sessions.
> > Ensure that autovacuum is progressing, or run a manual database-wide
VACUUM."
>
> It looks like the hint regarding replication slots was added at some
> point. Currently we have:
>
> ```
> errhint( [...]
> "You might also need to commit or roll back old prepared
> transactions, or drop stale replication slots.")));
> ```

Yes, the exact same text as it appeared in the [2] thread above, which
prompted Robert's comment I quoted. I take the point to mean: All of these
things need to be taken care of *first*, before vacuuming, so the hint
should order things so that it is clear.

> Please let me know if you think
> we should also add a suggestion to kill long-running sessions, etc.

+1 for also adding that.

- errmsg("database is not accepting commands to avoid wraparound data loss
in database \"%s\"",
+ errmsg("database is not accepting commands that generate new XIDs to
avoid wraparound data loss in database \"%s\"",

I'm not quite on board with the new message, but welcome additional
opinions. For one, it's a bit longer and now ambiguous. I also bet that
"generate XIDs" doesn't really communicate anything useful. The people who
understand exactly what that means, and what the consequences are, are
unlikely to let the system get near wraparound in the first place, and
might even know enough to ignore the hint.

I'm thinking we might need to convey something about "writes". While it's
less technically correct, I imagine it's more useful. Remember, many users
have it drilled in their heads that they need to drop immediately to
single-user mode. I'd like to give some idea of what they can and cannot do.

+ Previously it was required to stop the postmaster and VACUUM the
database
+ in a single-user mode. There is no need to use a single-user mode
anymore.

I think we need to go further and actively warn against it: It's slow,
impossible to monitor, disables replication and disables safeguards against
wraparound. (Other bad things too, but these are easily understandable for
users)

Maybe mention also that it's main use in wraparound situations is for a way
to perform DROPs and TRUNCATEs if that would help speed up resolution.

I propose for discussion that 0004 should show in the docs all the queries
for finding prepared xacts, repl slots etc. If we ever show the info at
runtime, we can dispense with the queries, but there seems to be no urgency
for that...

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2023-03-31 07:45:51 Re: regression coverage gaps for gist and hash indexes
Previous Message Jeff Davis 2023-03-31 07:36:14 Re: running logical replication as the subscription owner