RE: recoveryStopsAfter is not usable when recovery_target_inclusive is false

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Michael Paquier' <michael(at)paquier(dot)xyz>
Cc: "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: recoveryStopsAfter is not usable when recovery_target_inclusive is false
Date: 2025-07-24 11:19:02
Message-ID: OSCPR01MB14966BB49642F837FF757088CF55EA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Michael,

Sorry for the late reply.

> So, what you are doing here is changing the behavior of the check in
> recoveryStopsAfter(), with the addition of one exit path based on
> EndRecPtr if recovery_target_inclusive is false, to leave a bit
> earlier in case if we don't have a follow-up record.

Yes, that's right.

> Did you notice a speedup once you did that in your logirep workflow?

No, I have not verified the point. I feel it may not speed up for normal workload,
except last WAL does not exist. I noticed this point while working on the test
issue on pg_createsubscriber like 03b08c8f5 and [1].

> I am afraid that
> this change would be a new mode, skipping a couple of code paths if
> one decides to trigger the exit. And I highly suspect that you could
> break some cases that worked until now here, skipping one
> recoveryPausesHere() and one ProcessStartupProcInterrupts() to say the
> least.

Hmm. Yes, we must investigate the side-effect deeper.

>
> > I feel the process can exit bit earliyer, by comparing with the end point of the
> > applied record and recovery_target_lsn.
> > Attached patch roughly implemented the idea.
>
> Could you prove your point with a test or something that justifies a
> new definition?

One of my colleagues is creating the test to show the beneficial workload,
could you please see it?

>
> > I read the old discussions, but I cannot find the reason of current style.
>
> 35250b6ad7a8 was quite some time ago, as of this thread, with my
> fingerprints all over it:
> https://www.postgresql.org/message-id/flat/CAB7nPqRKKyZQhcB39mty9C_gf
> B0ODRUQZEWB4GPP8ARpdAB%3DZA%40mail.gmail.com

Yeah, I have found it but recoveryStopsAfter() has the same condition since the first
patch...

> If my memory serves me right, this comes down to the consistency with
> how stop XID targets are handled before and after records are read,
> except that this one applies to ReadRecPtr.

I may find the part you're referring:

```
/*
* There can be only one transaction end record with this exact
* transactionid
*
* when testing for an xid, we MUST test for equality only, since
* transactions are numbered in the order they start, not the order
* they complete. A higher numbered xid will complete before you about
* 50% of the time...
*/
if (recoveryTarget == RECOVERY_TARGET_XID && recoveryTargetInclusive &&
recordXid == recoveryTargetXid)
```

One difference between LSN and XID is the predictability. Regarding the WAL
record, the startpoint of the next WAL record is surely next to the endpoint of
previous WAL. In terms of transaction id, however, the commit ordering can be
different with the number. COMMIT for xid=400 may come after another commit for
xid=402. Thus we cannot predict whether we should finish the recovery after
applying one transaction, when recovery_target_inclusive = false and
recovery_target_xid is set.

[1]: https://www.postgresql.org/message-id/CANhcyEVqFCNhrbkCJwOpT1Su5-D3s%2BkSsOoc-4edKc7rzbRfeA%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-07-24 11:19:07 Re: index prefetching
Previous Message Fabrice Chapuis 2025-07-24 11:03:02 roles management problem after upgrading in PG 17