Re: [PATCH] Preserve replication origin OIDs in pg_upgrade

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shvetamalik(at)gmail(dot)com>
Subject: Re: [PATCH] Preserve replication origin OIDs in pg_upgrade
Date: 2026-06-08 17:06:13
Message-ID: CANhcyEVFSutXXdt_0XMCD37NE7Yd7ROdDVEE06D8YVyPNEBFHg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 28 May 2026 at 08:18, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
>
>
> On Fri, May 22, 2026 at 8:27 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>>
>> On Fri, May 22, 2026 at 3:16 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>> >
>> > On Mon, 18 May 2026 at 16:13, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>> > >
>> > > Rebased the patch as it was no longer applying.
>> > >
>> > Hi Ajin,
>> >
>> > I have started reviewing the patch. Here is my comment for v6-0002 patch:
>> >
>> > Suppose we have a replication setup: publisher -> subscriber
>> > and we are upgrading subscriber to subscriber_new.
>> > And if initially 'subscriber_new' has a replication origin, upgrading
>> > the cluster can error out.
>> >
>> > Example:
>> > We set up a logical replication between publisher node and subscriber node.
>> >
>> > On subscriber node:
>> > postgres=# SELECT * FROM pg_replication_origin;
>> > roident | roname
>> > ---------+----------
>> > 1 | pg_16393
>> > (1 row)
>> >
>> > And initially subscriber_new has a replication origin:
>> > postgres=# select pg_replication_origin_create('myname');
>> > pg_replication_origin_create
>> > ------------------------------
>> > 1
>> > (1 row)
>> >
>> > postgres=# SELECT * FROM pg_replication_origin;
>> > roident | roname
>> > ---------+--------
>> > 1 | myname
>> > (1 row)
>> >
>> > Now, if we run pg_upgrade to upgrade subscriber node to subscriber_new
>> > node, we get an error:
>> > ```
>> > SELECT pg_catalog.binary_upgrade_create_replication_origin('1'::pg_catalog.oid,
>> > 'pg_16393'::pg_catalog.name, '0/01743078'::pg_catalog.pg_lsn);
>> > psql:subscriber_new/pg_upgrade_output.d/20260522T140312.807/dump/pg_upgrade_dump_globals.sql:37:
>> > ERROR: replication origin with ID 1 already exists
>> > ```
>> >
>> > This error occurs in "Performing Upgrade" stage. Should we add a check
>> > in the "Performing Consistency Checks" stage so that we don't need to
>> > re-initdb the new cluster to perform the upgrade?
>> > Maybe we can add a check similar to
>> > check_new_cluster_replication_slots(), where pg_upgrade errors out if
>> > the new cluster already contains replication origins. Thoughts?
>>
>> +1. I had the same thought while reviewing the patch today. We should
>> have it unless there is a reason we have avoided it??
>
>
> Fixed this. Added a new check for replication origins and if the new cluster has any existing replication origins, then the check will fail.
>>
>>
>> Few trivial comments:
>>
>> 1)
>>
>> +#include "access/skey.h"
>> +#include "catalog/indexing.h"
>>
>> pg_upgrade_support.c compiles without above.
>>
>
> Removed.
>
>>
>> 2)
>> + Assert(!OidIsValid(rel->rd_rel->reltoastrelid));
>>
>> Is there a reason for this sanity check? I generally do not see a
>> Null-Toast table sanity check after every table_open.
>>
>
> Removed.
>
>>
>> 3)
>>
>> +
>> + /* Dump replication origins */
>> + if (server_version >= 170000 && binary_upgrade && archDumpFormat == archNull)
>> + dumpReplicationOrigins(conn);
>>
>> why the check is for PG17 specifically?
>>
>
> In PG17, we started migrating pg_subscription_rel and the remote LSN during upgrades; prior to that, these were not migrated. Given that change, it also makes sense to migrate replication origins from them. Otherwise, when upgrading from PG17 to a later version, you could end up with a subscription where pg_subscription_rel and the remote LSN are migrated, but the corresponding replication origin is not created.
>
>
>
> On Mon, May 25, 2026 at 5:13 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>>
>>
>> One issue in 002:
>>
>> binary_upgrade_create_replication_origin() has this:
>>
>> + originname = PG_GETARG_NAME(1);
>> +
>> + roname_d = CStringGetTextDatum(NameStr(*originname));
>> +
>>
>> We are getting origin-name (text) into Name-type which can not be more
>> than 64 bytes. So if an origin has name more than 64, it will end up
>> trimming the name post-upgrade.
>>
>> I tried this:
>>
>> Old-setup:
>> postgres=# SELECT
>> pg_replication_origin_create('this_is_a_very_long_replication_origin_name_that_exceeds_the_limit_of_64');
>> pg_replication_origin_create
>> ------------------------------
>> 1
>> postgres=# select * from pg_replication_origin;
>> roident | roname
>> ---------+----------------------------------------------------------------------
>> 1 | this_is_a_very_long_replication_origin_name_that_exceeds_the_limit_of_64
>>
>>
>> Post-upgrade: name got trimmed to 64 length.
>> -------------------------
>> postgres=# select * from pg_replication_origin;
>> roident | roname
>> ---------+-----------------------------------------------------------------
>> 1 | this_is_a_very_long_replication_origin_name_that_exceeds_the_li
>>
>> thanks
>> Shveta
>
>
> Fixed this. Now binary_upgrade_create_replication_origin handles it similarly to the way pg_replication_origin_create handles the name of the origin.
>
> Here's an updated version v7 containing these fixes.

Hi Ajin,
I have reviewed the patch and here are some comments:

1. I want to clarify, what is the expected behaviour when we upgrade
the postgres instance from version <= 16 to "HEAD + patch"?
- ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname,
sizeof(originname));
- replorigin_create(originname);
+ if (!IsBinaryUpgrade)
+ {
+ ReplicationOriginNameForLogicalRep(subid, InvalidOid,
originname, sizeof(originname));
+ replorigin_create(originname);
+ }
and
+ /* Dump replication origins */
+ if (server_version >= 170000 && binary_upgrade &&
archDumpFormat == archNull)
+ dumpReplicationOrigins(conn);

When the postgres instance with subscription is updated from PG 16 (or
less) to HEAD, the replication origin is created as we do not have "if
(!IsBinaryUpgrade)" check in HEAD,
whereas when the similar instance is upgraded to "HEAD + patch", no
replication origin is present after the upgrade.
Is this difference in the behaviour between HEAD and "HEAD + patch" expected?

2.a) Can we replace:
+ /* Check for OID collision */
+ ScanKeyInit(&key,
+ Anum_pg_replication_origin_roident,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(node));
+ scan = systable_beginscan(rel, ReplicationOriginIdentIndex,
+ true /* indexOK */,
+ SnapshotSelf,
+ 1, &key);
+ collides = HeapTupleIsValid(systable_getnext(scan));
+ systable_endscan(scan);
+
+ if (collides)
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin with ID %u already
exists", node_oid)));

with:
if (SearchSysCacheExists1(REPLORIGIDENT, ObjectIdGetDatum(node)))
ereport(ERROR, errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("replication origin with ID %u already
exists", node_oid));

2.b) Similarly, Can we replace:
+ /* Check for name collision */
+ ScanKeyInit(&key,
+ Anum_pg_replication_origin_roname,
+ BTEqualStrategyNumber, F_TEXTEQ,
+ roname_d);
+ scan = systable_beginscan(rel, ReplicationOriginNameIndex,
+ true /* indexOK */,
+ SnapshotSelf,
+ 1, &key);
+ collides = HeapTupleIsValid(systable_getnext(scan));
+ systable_endscan(scan);
+
+ if (collides)
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
with:
if (SearchSysCacheExists1(REPLORIGNAME, roname_d))
ereport(ERROR, errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("replication origin \"%s\" already
exists", originname));

This will make the code smaller and easy to read. Thoughts?

Thanks,
Shlok Kyal

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2026-06-08 17:33:04 Re: Fix bug of UPDATE/DELETE FOR PORTION OF with inheritance tables
Previous Message Sami Imseih 2026-06-08 16:35:53 Re: Report oldest xmin source when autovacuum cannot remove tuples