Quick Links

Re: Replication slot is not able to sync up

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Replication slot is not able to sync up
Date:	2025-06-14 15:37:05
Message-ID:	CAFiTN-v4=0uYEcsAyJOYeoPLr8sow6UoyB4S3o3w0Ou5bJg=Gg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, May 30, 2025 at 3:38 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wed, May 28, 2025 at 2:09 AM Masahiko Sawada wrote:
> >
> > On Fri, May 23, 2025 at 10:07 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > >
> > > In the case presented here, the logical slot is expected to keep
> > > forwarding, and in the consecutive sync cycle, the sync should be
> > > successful. Users using logical decoding APIs should also be aware
> > > that if due for some reason, the logical slot is not moving forward,
> > > the master/publisher node will start accumulating dead rows and WAL,
> > > which can create bigger problems.
> >
> > I've tried this case and am concerned that the slot synchronization using
> > pg_sync_replication_slots() would never succeed while the primary keeps
> > getting write transactions. Even if the user manually consumes changes on the
> > primary, the primary server keeps advancing its XID in the meanwhile. On the
> > standby, we ensure that the
> > TransamVariables->nextXid is beyond the XID of WAL record that it's
> > going to apply so the xmin horizon calculated by
> > GetOldestSafeDecodingTransactionId() ends up always being higher than the
> > slot's catalog_xmin on the primary. We get the log message "could not
> > synchronize replication slot "s" because remote slot precedes local slot" and
> > cleanup the slot on the standby at the end of pg_sync_replication_slots().
>
> To improve this workload scenario, we can modify pg_sync_replication_slots() to
> wait for the primary slot to advance to a suitable position before completing
> synchronization and removing the temporary slot. This would allow the sync to
> complete as soon as the primary slot advances, whether through
> pg_logical_xx_get_changes() or other ways.
>
> I've created a POC (attached) that currently waits indefinitely for the remote
> slot to catch up. We could later add a timeout parameter to control maximum
> wait time if this approach seems acceptable.
>
> I tested that, when pgbench TPC-B is running on the primary, calling
> pg_sync_replication_slots() on the standby correctly blocks until I advance the
> primary slot position by calling pg_logical_xx_get_changes().
>
> if the basic idea sounds reasonable then I can start a separate
> thread to extend this API. Thoughts ?

IMHO, this idea has merit, have you started a thread for reviewing this patch?

--
Regards,
Dilip Kumar
Google

In response to

RE: Replication slot is not able to sync up at 2025-05-30 10:07:42 from Zhijie Hou (Fujitsu)

Responses

RE: Replication slot is not able to sync up at 2025-06-16 03:54:18 from Zhijie Hou (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dimitrios Apostolou	2025-06-14 16:01:13	Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Previous Message	Tom Lane	2025-06-14 15:20:31	Re: Handling OID Changes in Regression Tests for C Extensions