| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Orphaned records in pg_replication_origin_status after subscription drop |
| Date: | 2025-12-22 21:22:38 |
| Message-ID: | CAD21AoCiFxFeWNaFKweX3L1op6GS3caxuzFWTdtFL2mfFQ6=gg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Dec 22, 2025 at 2:15 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Monday, December 22, 2025 2:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Mon, Dec 22, 2025 at 10:16 AM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Monday, December 22, 2025 7:01 AM Michael Paquier
> > <michael(at)paquier(dot)xyz> wrote:
> > > > On Sat, Dec 20, 2025 at 02:55:15PM +0530, Amit Kapila wrote:
> > > > > As of today, I can't think of a case where next time (restart
> > > > > after
> > > > > error) we won't generate the same origin_name but I think this
> > > > > will add the dependency that each time the origin name should be
> > > > > generated the same.
> > > >
> > > > ReplicationOriginNameForLogicalRep() would generate the origin name
> > > > as pg_suboid_relid, so we would always reuse the same origin name on
> > > > restart as long as the subscription is not recreated, wouldn't we?
> > > >
> > > > > Also, moving just repl_origin_create part (leaving other things
> > > > > like origin_advance at its location) would need some explanation
> > > > > in comments which is that it has some dependency on
> > > > > DropSubscription and cleanup. Anyway, if we want to go with
> > > > > creating origin in a separate transaction than COPY, then we
> > > > > should change few comments like: "It is possible that the origin
> > > > > is not yet created for tablesync worker, this can happen for the
> > > > > states before SUBREL_STATE_FINISHEDCOPY." in the code.
> > > >
> > > > Hmm. So... Do you think that it should be OK to just create a new
> > > > transaction before we open the transaction that locks
> > > > MyLogicalRepWorker->relid (one that opens a BEGIN READ ONLY on the
> > > > remote side)? And I guess that we would just move the
> > > > replorigin_by_name(),
> > > > replorigin_create() and ERROR into this new transaction? Setting up
> > > > replorigin_session_setup() and replorigin_session_origin could then
> > > > be delayed until we have done the
> > > > replorigin_advance() in BEGIN READ ONLY transaction? By that I mean
> > > > to leave the replorigin_advance() position untouched. I have
> > > > studied this code quite a bit. I "think" that something among these
> > > > lines would work, but I am not 100% sure if things are OK based the
> > > > relstate we store in each of these phases. If you have an argument that
> > invalidates these lines, please feel free!
> > >
> > > I think the solution you outlined works. One nitpick is that, instead
> > > of starting a new transaction, we could create the origin within the
> > > same transaction that updates the DATASYNC states, thereby ensuring
> > > the origin information is available as soon as the catalog is updated.
> > > I think the bug won't happen as long as the origin is created in a
> > > transaction separate from the one setting up the shared memory state.
> > >
> >
> > Agreed. But please update the comment (/* Update the state and make it
> > visible to others. */) just before that transaction to reflect that origin will also
> > be created in this transaction.
>
> Updated.
>
> >
> > > Additionally, if we relocate the origin creation code, we need to
> > > remove the ERROR report. This is because the origin may already exist
> > > if a table sync restarts due to an ERROR during the initial COPY. This
> > > should be safe since the origin is created with the reserved name
> > > "pg_xx," preventing users from creating another origin with the same name.
> > >
> >
> > Right.
> >
> > + /*
> > + * Advance the origin to the LSN got from walrcv_create_slot. This is
> > + WAL
> > + * logged for the purpose of recovery. Locks are to prevent the
> > + * replication origin from vanishing while advancing.
> > + */
> > + LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
> > + replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
> > + true /* go backward */ , true /* WAL log */ );
> > + UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
> > +
> > /*
> > * Setup replication origin tracking. The purpose of doing this before the
> > * copy is to avoid doing the copy again due to any error in setting up
> > * origin tracking.
> > */
> >
> > Shouldn't the second comment starting from "Setup replication origin .." be
> > merged with the previous one because it is also intended for the previous
> > code change?
>
> Agreed and merged.
>
> Here is the V2 patch which addressed above comments.
>
Thank you for making the patch! The patch looks good to me.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Matthias van de Meent | 2025-12-22 21:28:09 | Vectorize pg_visibility.pg_visibility_map_summary |
| Previous Message | Andreas Karlsson | 2025-12-22 21:17:27 | Re: DOCS - "\d mytable" also shows any publications that publish mytable |