| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Orphaned records in pg_replication_origin_status after subscription drop |
| Date: | 2025-12-23 07:20:45 |
| Message-ID: | aUpCzbbzOHZt7bv7@paquier.xyz |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Dec 23, 2025 at 08:21:39AM +0900, Michael Paquier wrote:
> Creating the origin at the end of the same transaction that sets the
> state to SUBREL_STATE_DATASYNC seems sensible here. I'll spend a
> couple of extra hours playing with all that across all the branches,
> see if I can wrap it. This includes some more error injection to
> cross-check the state of all these transactions with the states in
> the catalogs while we drop the subscription.
While looking at the state of the code across the six branches where
we need to fix this, there were two points that have been slightly
sticky on my mind:
1) check_old_cluster_subscription_state() in pg_upgrade's check.c,
where we have a set of comments dealing with the reasons why only the
initial and ready states are allowed for the transfers of the relation
data. The patch only makes the origin visible in the catalogs for one
extra state, DATASYNC now, meaning that we have nothing to care about.
I was wondering about the comment of DATASYNC being slightly incorrect
now because it only mentions a replication slot. Do you think that we
should adjust that as well to mention the case of origins, knowing
that their names are also based on subscription OIDs whose value
change across upgrades? That would not apply for relation IDs as
these are fixed, but this feels a bit inexact now for the branches
where this code applies.
2) 88f488319bac and f6c5edb8abca, that have slightly changed the
replication origin logic in v16~. Still, after looking at the code
for a couple of hours, as well as testing the DROP SUBSCRIPTION
stability while enforcing ERRORs in the tablesync worker to play with
the shmem state vs the catalog state, I could not find a hole in the
v15 and v14 code caused by the fact that we have the catalog state for
the origin available one state earlier. At the end I think that we
are OK here in light of these two commits.
The comment could be improved, but I don't see that as a reason good
enough to fix the issue first, so I have applied that down to v14 with
the couple of conflicts handled on the way.
--
Michael
| From | Date | Subject | |
|---|---|---|---|
| Next Message | zengman | 2025-12-23 07:26:20 | Re: Sequence Access Methods, round two |
| Previous Message | Alena Vinter | 2025-12-23 07:02:15 | Startup PANIC on standby promotion due to zero-filled WAL segment |