Re: Orphaned records in pg_replication_origin_status after subscription drop

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Orphaned records in pg_replication_origin_status after subscription drop
Date: 2025-12-22 06:08:32
Message-ID: CAA4eK1KE0gvaGBjXOiFot0PJNZ+85vik6KN4UTe5AZ7tW6B34A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 22, 2025 at 10:16 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Monday, December 22, 2025 7:01 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > On Sat, Dec 20, 2025 at 02:55:15PM +0530, Amit Kapila wrote:
> > > As of today, I can't think of a case where next time (restart after
> > > error) we won't generate the same origin_name but I think this will
> > > add the dependency that each time the origin name should be generated
> > > the same.
> >
> > ReplicationOriginNameForLogicalRep() would generate the origin name as
> > pg_suboid_relid, so we would always reuse the same origin name on restart
> > as long as the subscription is not recreated, wouldn't we?
> >
> > > Also, moving just repl_origin_create part (leaving other things like
> > > origin_advance at its location) would need some explanation in
> > > comments which is that it has some dependency on DropSubscription and
> > > cleanup. Anyway, if we want to go with creating origin in a separate
> > > transaction than COPY, then we should change few comments like: "It is
> > > possible that the origin is not yet created for tablesync worker, this
> > > can happen for the states before SUBREL_STATE_FINISHEDCOPY." in the
> > > code.
> >
> > Hmm. So... Do you think that it should be OK to just create a new transaction
> > before we open the transaction that locks
> > MyLogicalRepWorker->relid (one that opens a BEGIN READ ONLY on the
> > remote side)? And I guess that we would just move the replorigin_by_name(),
> > replorigin_create() and ERROR into this new transaction? Setting up
> > replorigin_session_setup() and replorigin_session_origin could then be
> > delayed until we have done the
> > replorigin_advance() in BEGIN READ ONLY transaction? By that I mean to
> > leave the replorigin_advance() position untouched. I have studied this code
> > quite a bit. I "think" that something among these lines would work, but I am
> > not 100% sure if things are OK based the relstate we store in each of these
> > phases. If you have an argument that invalidates these lines, please feel free!
>
> I think the solution you outlined works. One nitpick is that, instead of
> starting a new transaction, we could create the origin within the same
> transaction that updates the DATASYNC states, thereby ensuring the origin
> information is available as soon as the catalog is updated. I think the bug
> won't happen as long as the origin is created in a transaction separate
> from the one setting up the shared memory state.
>

Agreed. But please update the comment (/* Update the state and make it
visible to others. */) just before that transaction to reflect that
origin will also be created in this transaction.

> Additionally, if we relocate the origin creation code, we need to remove the
> ERROR report. This is because the origin may already exist if a table sync
> restarts due to an ERROR during the initial COPY. This should be safe since the
> origin is created with the reserved name "pg_xx," preventing users from creating
> another origin with the same name.
>

Right.

+ /*
+ * Advance the origin to the LSN got from walrcv_create_slot. This is WAL
+ * logged for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
+ */
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
/*
* Setup replication origin tracking. The purpose of doing this before the
* copy is to avoid doing the copy again due to any error in setting up
* origin tracking.
*/

Shouldn't the second comment starting from "Setup replication origin
.." be merged with the previous one because it is also intended for
the previous code change?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-12-22 06:26:57 Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions
Previous Message jian he 2025-12-22 06:08:00 Re: Mention TABLEFUNC to make comment consistent with code