Quick Links

Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, duffieldzane(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load
Date:	2025-07-23 09:19:42
Message-ID:	CAA4eK1LCOaZBb5XtKqcPi0v6kJrKMZ-P8wpOCQxwy=cBTogurQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Tue, Jul 22, 2025 at 4:54 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> On Thu, 10 Jul 2025 at 12:33, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > We should find out in which case and why the consisten_lsn is a start
> > point LSN of a commit record. We use slot's confirm_flush LSN location
> > as a consistent_lsn, which normally should be the end point of
> > running_xacts record or commit_end LSN record (in case client sends
> > ack).
> >
> I checked it and here is my analysis:
>
> When we create a slot, it returns the confirmed_flush LSN as a
> consistent_lsn. I noticed that in general when we create a slot, the
> confirmed_flush is set to the end of a RUNNING_XACT log or we can say
> start of the next record. And this next record can be anything. Ii can
> be a COMMIT record for a transaction in another session.
>
...
> waldump record:
> rmgr: Standby len (rec/tot): 70/ 70, tx: 0, lsn:
> 0/03CBCC58, prev 0/03CBCC18, desc: RUNNING_XACTS nextXid 1370
> latestCompletedXid 1364 oldestRunningXid 1365; 5 xacts: 1366 1365 1369
> 1368 1367
> The consistent point is found at "0/3CBCC58".
>
> When slot is created the confirmed_flush is set inside function
> "DecodingContextFindStartpoint" using:
> slot->data.confirmed_flush = ctx->reader->EndRecPtr;
> In our case the value of consistent_lsn is "0/03CBCCA0" (I added some
> logs and got the value). Logs:
> 2025-07-20 16:50:18.039 IST [1780326] port=5340
> ubuntu(at)test_db/[unknown] LOG: #### confirmed_flush = 0/03CBCCA0
> inside DecodingContextFindStartpoint
> 2025-07-20 16:50:18.039 IST [1780326] port=5340
> ubuntu(at)test_db/[unknown] STATEMENT: SELECT lsn FROM
> pg_catalog.pg_create_logical_replication_slot('sub', 'pgoutput',
> false, false, false)
>
> This consistent_lsn "0/03CBCCA0" is nothing but End of RUNNING_XACT (
> whose start is "0/3CBCC58").
>
> While the slot is being created a transaction in a concurrent session
> commits (just after the third RUNNING_XACT) and add a COMMIT log:
> rmgr: Transaction len (rec/tot): 46/ 46, tx: 1369, lsn:
> 0/03CBCCA0, prev 0/03CBCC58, desc: COMMIT 2025-07-20 16:50:18.031146
> IST
>
> So, in such cases the consistent LSN can be set to a COMMIT record.
>

Your analysis and proposed patch looks good to me. I'll push this
patch tomorrow unless Euler or someone thinks otherwise.

> > If we decide to fix in the way proposed here, then we also need to
> > investigate whether we need an additional WAL record added by commit'
> > 03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc. The reason why that
> > additional WAL record was added is discussed in email [1].
> >
> > [1] - https://www.postgresql.org/message-id/flat/2377319.1719766794%40sss.pgh.pa.us#bba9f5ee0efc73151cc521a6bd5182ed
>
> I reverted the change added by commit
> 03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc and applied my patch and
> checked the behaviour. And I am able to reproduce the issue the commit
> was resolving. I think this change is still required.
> This change is still required because, while recovery is performed in
> the function 'PerformWalRecovery', when recovery_target_inclusive is
> set to false, function 'recoveryStopsBefore' is responsible to set
> whether recovery is finished or not. This function will set
> 'reachedRecoveryTarget' to true when it satisfy the condition
> /* Check if target LSN has been reached */
> if (recoveryTarget == RECOVERY_TARGET_LSN &&
> !recoveryTargetInclusive &&
> record->ReadRecPtr >= recoveryTargetLSN)
> Here we are checking if "start of the record" >= recoveryTargetLSN.
>
> When a replication slot is created, consistent_lsn is obtained. Since
> this consistent_lsn points to End of the record (or we can say start
> of the next record), there can be a case that there is no WAL record
> corresponding to the consistent lsn. So, during the recovery, it will
> wait till it reads the record corresponding to consistent lsn (during
> my testing this wait was around ~20 sec). And this wait can create the
> timeout issue.
> I have manually debugged and checked the above case and I think the
> change in commit 03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc is still
> needed.
>

Agreed.

Thanks for the detailed analysis.

--
With Regards,
Amit Kapila.

In response to

Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load at 2025-07-22 11:24:29 from Shlok Kyal

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	vignesh C	2025-07-23 09:24:49	Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load
Previous Message	Shlok Kyal	2025-07-23 04:35:01	Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load