| From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | RE: Orphaned records in pg_replication_origin_status after subscription drop |
| Date: | 2025-12-22 04:46:43 |
| Message-ID: | TY4PR01MB16907837DE56FBF456FD4213394B4A@TY4PR01MB16907.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Monday, December 22, 2025 7:01 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Sat, Dec 20, 2025 at 02:55:15PM +0530, Amit Kapila wrote:
> > As of today, I can't think of a case where next time (restart after
> > error) we won't generate the same origin_name but I think this will
> > add the dependency that each time the origin name should be generated
> > the same.
>
> ReplicationOriginNameForLogicalRep() would generate the origin name as
> pg_suboid_relid, so we would always reuse the same origin name on restart
> as long as the subscription is not recreated, wouldn't we?
>
> > Also, moving just repl_origin_create part (leaving other things like
> > origin_advance at its location) would need some explanation in
> > comments which is that it has some dependency on DropSubscription and
> > cleanup. Anyway, if we want to go with creating origin in a separate
> > transaction than COPY, then we should change few comments like: "It is
> > possible that the origin is not yet created for tablesync worker, this
> > can happen for the states before SUBREL_STATE_FINISHEDCOPY." in the
> > code.
>
> Hmm. So... Do you think that it should be OK to just create a new transaction
> before we open the transaction that locks
> MyLogicalRepWorker->relid (one that opens a BEGIN READ ONLY on the
> remote side)? And I guess that we would just move the replorigin_by_name(),
> replorigin_create() and ERROR into this new transaction? Setting up
> replorigin_session_setup() and replorigin_session_origin could then be
> delayed until we have done the
> replorigin_advance() in BEGIN READ ONLY transaction? By that I mean to
> leave the replorigin_advance() position untouched. I have studied this code
> quite a bit. I "think" that something among these lines would work, but I am
> not 100% sure if things are OK based the relstate we store in each of these
> phases. If you have an argument that invalidates these lines, please feel free!
I think the solution you outlined works. One nitpick is that, instead of
starting a new transaction, we could create the origin within the same
transaction that updates the DATASYNC states, thereby ensuring the origin
information is available as soon as the catalog is updated. I think the bug
won't happen as long as the origin is created in a transaction separate
from the one setting up the shared memory state.
Additionally, if we relocate the origin creation code, we need to remove the
ERROR report. This is because the origin may already exist if a table sync
restarts due to an ERROR during the initial COPY. This should be safe since the
origin is created with the reserved name "pg_xx," preventing users from creating
another origin with the same name.
I'm attaching a small patch for reference. To verify the fix, I've added a
simple test in 004_sync.pl, which was introduced alongwith ce0fdbfe9722. This
test can reproduce the issue (without the fix) by intentionally causing the
initial COPY to fail.
Best Regards,
Hou zj
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Fix-orphaned-origin-in-shared-memory-after-subscr.patch | application/octet-stream | 6.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shveta malik | 2025-12-22 04:50:30 | Re: Logical Replication of sequences |
| Previous Message | Chao Li | 2025-12-22 02:59:12 | Re: A few patches to clarify snapshot management, part 2 |