Re: pg_rewind with cascade standby doesn't work well

From: Kuwamura Masaki <kuwamura(at)db(dot)is(dot)i(dot)nagoya-u(dot)ac(dot)jp>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind with cascade standby doesn't work well
Date: 2023-09-20 02:46:45
Message-ID: CAMyC8qryE7mKyvPvGHCt5GpANAmp8sS_tRbraqXcPBx14viy6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> IMO a test is needed that makes sure no one is going to break this in
>> the future.
>
> You definitely need more complex test scenarios for that. If you can
> come up with new ways to make the TAP tests of pg_rewind mode modular
> in handling more complicated node setups, that would be a nice
> addition, for example.

I'm sorry for lacking tests. For now, I started off with a simple test
that cause the problem I mentioned. The updated WIP patch 0001 includes
the new test for pg_rewind.
And also, I'm afraid that I'm not sure what kind of tests I have to make
for fix this behavior. Would you mind giving me some advice?

> This patch is at least incorrect in its handling of crash recovery,
> because these two should *never* be set in this case as we want to
> replay up to the end of WAL. For example, see xlog.c or the top of
> xlogrecovery.c about the assumptions behind these variables:
> /* crash recovery should always recover to the end of WAL */
> ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
> ControlFile->minRecoveryPointTLI = 0;
>
> If an end-of-recovery record is replayed during crash recovery, these
> assumptions are plain broken.

That make sense! I really appreciate your knowledgeable review.

> One thing that we could consider is to be more aggressive with
> restartpoints when replaying this record for a standby, see a few
> lines above the lines added by your patch, for example. And we could
> potentially emulate a post-promotion restart point to get a refresh of
> the control file as it should, with the correct code paths involved in
> the updates of minRecoveryPoint when the checkpointer does the job.

I'm not confident but you meant we could make restartpoint
(i.e., call `RequestCheckpoint()`) instead of my old patch?
The patch 0001 also contains my understanding.

I also found a bug (maybe). If we call `CreateRestartPoint()` during
crash-recovery, the assertion fails at ComputeXidHorizon() in procarray.c.
It's inherently orthogonal to the problem I already reported. So you can
reproduce this at HEAD with this procedure.

1. Start primary and standby server
2. Modify checkpoint_timeout to 1h on standby
3. Insert 10^10 records and concurrently run CHECKPOINT every second on
primary
4. Do an immediate stop on both standby and primary at the end of the insert
5. Modify checkpoint_timeout to 30 on standby
6. Remove standby.signal on standby
7. Restart standby (it will start crash-recovery)
8. Assertion failure is raised shortly

I think this is because `TruncateSUBTRANS();` in `CreateRestartPoint()` is
called but `StartupSUBTRANS()` isn't called yet. In `StartupXLOG()`, we
call
`StartupSUBTRANS()` if `(ArchiveRecoveryRequested && EnableHotStandby)`.
However, in `CreateRestartPoint()`, we call `TruncateSUBTRANS()` if
`(EnableHotStandby)`. I guess the difference causes this bug. The latter
possibly be called even crash-recovery while former isn't.
The attached patch 0002 fixes it. I think we could discuss about this bug
in
another thread if needed.

Best regards!

Masaki Kuwamura

Attachment Content-Type Size
v2-0001-pg_rewind-Fix-bug-using-cascade-standby-as-source.patch application/octet-stream 5.4 KB
v1-0002-Fix-restartpoint-during-crash-recovery.patch application/octet-stream 593 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Jungwirth 2023-09-20 02:50:10 Re: SQL:2011 application time
Previous Message Nathan Bossart 2023-09-20 02:30:33 Re: Inefficiency in parallel pg_restore with many tables