From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, duffieldzane(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load |
Date: | 2025-07-22 11:24:29 |
Message-ID: | CANhcyEVqFCNhrbkCJwOpT1Su5-D3s+kSsOoc-4edKc7rzbRfeA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, 10 Jul 2025 at 12:33, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Apr 29, 2025 at 1:17 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> >
> > On Mon, 28 Apr 2025 at 10:28, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > With this approach, there is a risk of starting from the next WAL
> > > record after the consistent point. For example, if the slot returns a
> > > consistent point at 0/1715E10, after the fix we would begin replaying
> > > from the next WAL record, such as 0/1715E40, which could potentially
> > > lead to data loss.
> > > As an alternative, we could set recovery_target_inclusive to false in
> > > the setup_recovery function. This way, recovery would stop just before
> > > the recovery target, allowing the publisher to start replicating
> > > exactly from the consistent point.
> > > Thoughts?
> >
> > This approach looks better to me.
> > I have prepared the patch for the same.
> >
>
> We should find out in which case and why the consisten_lsn is a start
> point LSN of a commit record. We use slot's confirm_flush LSN location
> as a consistent_lsn, which normally should be the end point of
> running_xacts record or commit_end LSN record (in case client sends
> ack).
>
I checked it and here is my analysis:
When we create a slot, it returns the confirmed_flush LSN as a
consistent_lsn. I noticed that in general when we create a slot, the
confirmed_flush is set to the end of a RUNNING_XACT log or we can say
start of the next record. And this next record can be anything. Ii can
be a COMMIT record for a transaction in another session.
I have attached server logs and waldump logs for one of such case
reproduced using test script shared in [1].
The snapbuild machinery has four steps: START, BUILDING_SNAPSHOT,
FULL_SNAPSHOT and SNAPBUILD_CONSISTENT. Between each step a
RUNNING_XACT is logged.
When the command to create a slot is executed, during the first
RUNNING_XACT, we get following logs:
2025-07-20 16:50:18.002 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] LOG: logical decoding found initial starting
point at 0/3C813D8
2025-07-20 16:50:18.002 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] DETAIL: Waiting for transactions
(approximately 2) older than 1353 to end.
2025-07-20 16:50:18.002 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] STATEMENT: SELECT lsn FROM
pg_catalog.pg_create_logical_replication_slot('sub', 'pgoutput',
false, false, false)
waldump log corresponding to this:
rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
0/03C813D8, prev 0/03C81390, desc: RUNNING_XACTS nextXid 1353
latestCompletedXid 1350 oldestRunningXid 1351; 2 xacts: 1352 1351
For second RUNNING_XACT, we get following logs:
2025-07-20 16:50:18.007 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] LOG: logical decoding found initial
consistent point at 0/3C97948
2025-07-20 16:50:18.007 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] DETAIL: Waiting for transactions
(approximately 8) older than 1362 to end.
2025-07-20 16:50:18.007 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] STATEMENT: SELECT lsn FROM
pg_catalog.pg_create_logical_replication_slot('sub', 'pgoutput',
false, false, false)
waldump record:
len (rec/tot): 82/ 82, tx: 0, lsn: 0/03C97948, prev
0/03C97900, desc: RUNNING_XACTS nextXid 1362 latestCompletedXid 1353
oldestRunningXid 1354; 8 xacts: 1357 1358 1355 1354 1359 1361 1356
1360
For third RUNNING_XACT, we get following logs::
2025-07-20 16:50:18.038 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] LOG: logical decoding found consistent point
at 0/3CBCC58
2025-07-20 16:50:18.038 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] DETAIL: There are no old transactions
anymore.
2025-07-20 16:50:18.038 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] STATEMENT: SELECT lsn FROM
pg_catalog.pg_create_logical_replication_slot('sub', 'pgoutput',
false, false, false)
waldump record:
rmgr: Standby len (rec/tot): 70/ 70, tx: 0, lsn:
0/03CBCC58, prev 0/03CBCC18, desc: RUNNING_XACTS nextXid 1370
latestCompletedXid 1364 oldestRunningXid 1365; 5 xacts: 1366 1365 1369
1368 1367
The consistent point is found at "0/3CBCC58".
When slot is created the confirmed_flush is set inside function
"DecodingContextFindStartpoint" using:
slot->data.confirmed_flush = ctx->reader->EndRecPtr;
In our case the value of consistent_lsn is "0/03CBCCA0" (I added some
logs and got the value). Logs:
2025-07-20 16:50:18.039 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] LOG: #### confirmed_flush = 0/03CBCCA0
inside DecodingContextFindStartpoint
2025-07-20 16:50:18.039 IST [1780326] port=5340
ubuntu(at)test_db/[unknown] STATEMENT: SELECT lsn FROM
pg_catalog.pg_create_logical_replication_slot('sub', 'pgoutput',
false, false, false)
This consistent_lsn "0/03CBCCA0" is nothing but End of RUNNING_XACT (
whose start is "0/3CBCC58").
While the slot is being created a transaction in a concurrent session
commits (just after the third RUNNING_XACT) and add a COMMIT log:
rmgr: Transaction len (rec/tot): 46/ 46, tx: 1369, lsn:
0/03CBCCA0, prev 0/03CBCC58, desc: COMMIT 2025-07-20 16:50:18.031146
IST
So, in such cases the consistent LSN can be set to a COMMIT record.
> If we decide to fix in the way proposed here, then we also need to
> investigate whether we need an additional WAL record added by commit'
> 03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc. The reason why that
> additional WAL record was added is discussed in email [1].
>
> [1] - https://www.postgresql.org/message-id/flat/2377319.1719766794%40sss.pgh.pa.us#bba9f5ee0efc73151cc521a6bd5182ed
I reverted the change added by commit
03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc and applied my patch and
checked the behaviour. And I am able to reproduce the issue the commit
was resolving. I think this change is still required.
This change is still required because, while recovery is performed in
the function 'PerformWalRecovery', when recovery_target_inclusive is
set to false, function 'recoveryStopsBefore' is responsible to set
whether recovery is finished or not. This function will set
'reachedRecoveryTarget' to true when it satisfy the condition
/* Check if target LSN has been reached */
if (recoveryTarget == RECOVERY_TARGET_LSN &&
!recoveryTargetInclusive &&
record->ReadRecPtr >= recoveryTargetLSN)
Here we are checking if "start of the record" >= recoveryTargetLSN.
When a replication slot is created, consistent_lsn is obtained. Since
this consistent_lsn points to End of the record (or we can say start
of the next record), there can be a case that there is no WAL record
corresponding to the consistent lsn. So, during the recovery, it will
wait till it reads the record corresponding to consistent lsn (during
my testing this wait was around ~20 sec). And this wait can create the
timeout issue.
I have manually debugged and checked the above case and I think the
change in commit 03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc is still
needed.
I have attached the rebased patches for HEAD, REL_18_STABLE and REL_17_STABLE
Thanks,
Shlok Kyal
Attachment | Content-Type | Size |
---|---|---|
v5_REL_18-0001-Fix-duplicate-insert-during-pg_createsubsc.txt | text/plain | 2.1 KB |
server.log | application/octet-stream | 5.1 KB |
wal.log | application/octet-stream | 548.3 KB |
v5_REL_17-0001-Fix-duplicate-insert-during-pg_createsubsc.txt | text/plain | 2.1 KB |
v5_HEAD-0001-Fix-duplicate-insert-during-pg_createsubscri.patch | application/octet-stream | 2.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2025-07-22 12:21:33 | RE: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load |
Previous Message | Laurenz Albe | 2025-07-22 02:27:24 | Re: BUG #18964: `ALTER DATABASE ... RESET ...` fails to reset extension parameters that no longer exist |