Re: [BUG] recovery of prepared transactions during promotion can fail

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: julian(dot)markwort(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG] recovery of prepared transactions during promotion can fail
Date: 2023-06-19 05:41:54
Message-ID: 20230619.144154.1628674635368115138.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 19 Jun 2023 14:24:44 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Fri, Jun 16, 2023 at 04:27:40PM +0200, Julian Markwort wrote:
> > Note that it is important that the PREPARE entry is in the WAL file
> > that PostgreSQL is writing to prior to the inital crash.
> > This has happened repeatedly in production already with a customer
> > that uses prepared transactions quite frequently. I assume that
> > this has happened for others too, but the circumstances of the crash
> > and the cause are very dubious, and troubleshooting it is pretty
> > difficult.
>
> I guess that this is a possibility yes. I have not heard directly
> about such a report, but perhaps that's just because few people use
> 2PC.

+1

> > This behaviour has - apparently unintentionally - been fixed in PG
> > 15 and upwards (see commit 811051c ), as part of a general
> > restructure and reorganization of this portion of xlog.c (see commit
> > 6df1543 ).
> >
> > Furthermore, it seems this behaviour does not appear in PG 12 and
> > older, due to another possible bug: In PG 13 and newer, the
> > XLogReaderState is reset in XLogBeginRead() before reading WAL in
> > XlogReadTwoPhaseData() in twophase.c .
> > In the older releases (PG <= 12), this reset is not done, so the
> > requested LSN containing the prepared transaction can (by happy
> > coincidence) be read from in-memory buffers, and PostgreSQL
> > consequently manages to come up just fine (as the WAL has already
> > been read into buffers prior to the .partial rename). If the older
> > releases also where to properly reset the XLogReaderState, they
> > would also fail to find the LSN on disk, and hence PostgreSQL would
> > crash again.
>
> That's debatable, but I think that I would let v12 and v11 be as they
> are. v11 is going to be end-of-life soon and we did not have any
> complains on this matter as far as I know, so there is a risk of
> breaking something upon its last release. (Got some, Err..
> experiences with that in the past). On REL_11_STABLE, note for
> example the slight difference with the handling of
> recovery_end_command, where we rely on InRecovery rather than
> ArchiveRecoveryRequested. REL_12_STABLE is in a more consistent shape
> than v11 regarding that.

Agree about 11, it's no use patching. About 12, I slightly prefer
applying this but I'm fine without it since no actual problem are
seen.

> > I've attached patches for PG 14 and PG 13 that mimic the change in
> > PG15 (commit 811051c ) and reorder the crucial events, placing the
> > recovery of prepared transactions *before* renaming the file.
>
> Yes, I think that's OK. I would like to add two things to your
> proposal for all the existing branches.
> - Addition of a comment where RecoverPreparedTransactions() is called
> at the end of recovery to tell that we'd better do that before working
> on the last partial segment of the old timeline.
> - Enforce the use of archiving in 009_twophase.pl.

Both look good to me.

> > My humble opinion is that this fix should be backpatched to PG 14
> > and PG 13. It's debatable whether the fix needs to be brought back
> > to 12 and older also, as those do not exhibit this issue, but the
> > order of renaming is still wrong.
>
> Yeah, I'd rather wait for somebody to complain about that. And v11 is
> not worth taking risks with at this time of the year, IMHO.

I don't have a complaint as the whole.

> With your fix included, the patch for REL_14_STABLE would be like the
> attached. Is that OK for you?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-06-19 06:02:53 Re: Synchronizing slots from primary to standby
Previous Message Michael Paquier 2023-06-19 05:37:48 Re: Allow pg_archivecleanup to remove backup history files