Re: [PATCH] Preserve replication origin OIDs in pg_upgrade

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shvetamalik(at)gmail(dot)com>
Subject: Re: [PATCH] Preserve replication origin OIDs in pg_upgrade
Date: 2026-06-17 09:19:19
Message-ID: CAFPTHDbyVt_ggdi+rOP4=jBmgU=DDgYWuLxc4oQ6ux-T0H-ZBg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 5, 2026 at 8:10 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> Few comments:
> 1) pg_dumpall dumps all replication origins, not just subscription
> origins and recreates every origin:
> SELECT o.*, os.remote_lsn
> FROM pg_replication_origin o
> LEFT JOIN pg_replication_origin_status os
> ON o.roident = os.local_id
>
> Is pg_upgrade expected to preserve user-created origins?
> If so, you can mention this in the commit message
>

I explicitly state that all replication origins are migrated in the
commit message, nothing specific about subscription origins.

> 2) Should the table lock be released here, if so should we mention
> comments similar to how it is mentioned in
> binary_upgarde_add_sub_rel_state:
> + /* Restore the remote_lsn if provided, while still holding the lock */
> + if (!PG_ARGISNULL(2))
> + {
> + XLogRecPtr remote_commit = PG_GETARG_LSN(2);
>
> - ReplicationSlotRelease();
> + replorigin_advance(node, remote_commit, InvalidXLogRecPtr,
> + false /* backward */,
> + false /* WAL log */);
> + }
> +
> + table_close(rel, RowExclusiveLock);
>

Is that relevant when creating a new row? or only when updating an existing one?

> 3) We can use new style of ereport to exclude '(' before errcode:
> 3.a)
> + ereport(ERROR,
> + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> + errmsg("replication origin name is too long"),
> + errdetail("Replication origin names must be no longer than %d bytes.",
> + MAX_RONAME_LEN)));
>
> 3.b) Similarly here:
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("replication origin with ID %u already exists", node_oid)));
>
> 3.c) There are many others, you can search other ereport in the patch
> and change it too.
>

Done.

> 4) Since only one column will be returned here, can we use
> "PQgetvalue(res, 0, 0) and remove PQfnumber:
> + res = executeQueryOrDie(conn,
> + "SELECT count(*) AS nrepl_origins "
> + "FROM pg_catalog.pg_replication_origin");
> + i_nrepl_origins = PQfnumber(res, "nrepl_origins");
> + cluster->nrepl_origins = atoi(PQgetvalue(res, 0, i_nrepl_origins));
> + PQclear(res);
>

Done.

> 5) replorigin_create also has some duplicate code like below, will it
> be possible to have a common function so that both of them can use?
> /* Check for name collision */
> + ScanKeyInit(&key,
> + Anum_pg_replication_origin_roname,
> + BTEqualStrategyNumber, F_TEXTEQ,
> + roname_d);
> + scan = systable_beginscan(rel, ReplicationOriginNameIndex,
> + true /* indexOK */,
> + SnapshotSelf,
> + 1, &key);
> + collides = HeapTupleIsValid(systable_getnext(scan));
> + systable_endscan(scan);
> +
> + if (collides)
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("replication origin \"%s\" already exists",
> + originname)));
> +
> + memset(&nulls, 0, sizeof(nulls));
> + memset(&values, 0, sizeof(values));
> +
> + values[Anum_pg_replication_origin_roident - 1] = ObjectIdGetDatum(node);
> + values[Anum_pg_replication_origin_roname - 1] = roname_d;
> +
> + tuple = heap_form_tuple(RelationGetDescr(rel), values, nulls);
> + CatalogTupleInsert(rel, tuple);
> + heap_freetuple(tuple);
> + CommandCounterIncrement();
>

I have created a new function replorigin_create_with_id() which has
some common code and called that from both replorigin_create() and
binary_upgrade_create_replication_origin()

On Tue, Jun 9, 2026 at 3:06 AM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> Hi Ajin,
> I have reviewed the patch and here are some comments:
>
> 1. I want to clarify, what is the expected behaviour when we upgrade
> the postgres instance from version <= 16 to "HEAD + patch"?
> - ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname,
> sizeof(originname));
> - replorigin_create(originname);
> + if (!IsBinaryUpgrade)
> + {
> + ReplicationOriginNameForLogicalRep(subid, InvalidOid,
> originname, sizeof(originname));
> + replorigin_create(originname);
> + }
> and
> + /* Dump replication origins */
> + if (server_version >= 170000 && binary_upgrade &&
> archDumpFormat == archNull)
> + dumpReplicationOrigins(conn);
>
> When the postgres instance with subscription is updated from PG 16 (or
> less) to HEAD, the replication origin is created as we do not have "if
> (!IsBinaryUpgrade)" check in HEAD,
> whereas when the similar instance is upgraded to "HEAD + patch", no
> replication origin is present after the upgrade.
> Is this difference in the behaviour between HEAD and "HEAD + patch" expected?
>

I have modified this as option 3 suggested by Shveta and Amit and
removed the PG17 restriction for migrating replication origins, now it
does it for all versions that have replication origins.

> 2.a) Can we replace:
> + /* Check for OID collision */
> + ScanKeyInit(&key,
> + Anum_pg_replication_origin_roident,
> + BTEqualStrategyNumber, F_OIDEQ,
> + ObjectIdGetDatum(node));
> + scan = systable_beginscan(rel, ReplicationOriginIdentIndex,
> + true /* indexOK */,
> + SnapshotSelf,
> + 1, &key);
> + collides = HeapTupleIsValid(systable_getnext(scan));
> + systable_endscan(scan);
> +
> + if (collides)
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("replication origin with ID %u already
> exists", node_oid)));
>
> with:
> if (SearchSysCacheExists1(REPLORIGIDENT, ObjectIdGetDatum(node)))
> ereport(ERROR, errcode(ERRCODE_DUPLICATE_OBJECT),
> errmsg("replication origin with ID %u already
> exists", node_oid));
>
> 2.b) Similarly, Can we replace:
> + /* Check for name collision */
> + ScanKeyInit(&key,
> + Anum_pg_replication_origin_roname,
> + BTEqualStrategyNumber, F_TEXTEQ,
> + roname_d);
> + scan = systable_beginscan(rel, ReplicationOriginNameIndex,
> + true /* indexOK */,
> + SnapshotSelf,
> + 1, &key);
> + collides = HeapTupleIsValid(systable_getnext(scan));
> + systable_endscan(scan);
> +
> + if (collides)
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("replication origin \"%s\" already exists",
> + originname)));
> with:
> if (SearchSysCacheExists1(REPLORIGNAME, roname_d))
> ereport(ERROR, errcode(ERRCODE_DUPLICATE_OBJECT),
> errmsg("replication origin \"%s\" already
> exists", originname));
>
> This will make the code smaller and easy to read. Thoughts?

Done

On Tue, Jun 9, 2026 at 2:13 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
>
> Please find a few comments on v007:
>
> 1)
>
> + node = (ReplOriginId) node_oid;
> + originname = text_to_cstring(PG_GETARG_TEXT_PP(1));
> +
> + if (strlen(originname) > MAX_RONAME_LEN)
> + ereport(ERROR,
> + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> + errmsg("replication origin name is too long"),
> + errdetail("Replication origin names must be no longer than %d bytes.",
> + MAX_RONAME_LEN)));
> +
> + roname_d = CStringGetTextDatum(originname);
>
> CStringGetTextDatum is:
> #define CStringGetTextDatum(s) PointerGetDatum(cstring_to_text(s))
> ---------
>
> We are first converting text to C-string , then Cstring to text and
> then getting Datum. Double allocation and double conversion. Can we
> please try this instead and verify if it works:
>
> text *origin_text = PG_GETARG_TEXT_PP(1);
> originname = text_to_cstring(origin_text);
> roname_d = PointerGetDatum(origin_text);
>

I've restructured the code now, its a bit tough to do this now.

> 2)
> +check_new_cluster_replication_origins(void)
> +{
> + PGconn *conn;
> + PGresult *res;
> + int norigins;
> +
> + /* Quick return if there are no replication origins to migrate. */
> + if (old_cluster.nrepl_origins == 0)
> + return;
>
> Don't we need a version check simialr to what we have introduced in
> pg_dumpall for origins? (Similar to
> check_new_cluster_replication_slots as well)
>

I've removed the version check for PG17 now.

> 3)
> Since we have introduced check_new_cluster_replication_origins, do we
> even need below checks in binary_upgrade_create_replication_origin()?
> In which conditions will they be hit?
>
> + /* Check for OID collision */
> + ....
> + if (collides)
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("replication origin with ID %u already exists", node_oid)));
> +
> + /* Check for name collision */
> + ....
> + if (collides)
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("replication origin \"%s\" already exists",
> + originname)));
>

With the restructured code, its not easy to do this now.

>
> 4)
> Now since check_new_cluster_subscription_configuration() is purely
> about checking max-origin GUC configuration, I think the name is
> misleading. IMO, we should merge it to
> check_new_cluster_replication_origins(). See how
> check_new_cluster_replication_slots() does it: checking both
> new-cluster's count and the max-configuration for slots.
>

Done.

On Tue, Jun 9, 2026 at 9:32 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> >
> One more comment:
>
> 3. In logical-replication.sgml, we have documentation as:
> "Commit timestamps and origin data are not preserved during the upgrade."
> Now since the origin related data is preserved after the upgrade,
> should we update the documentation here?
>

Updated.

I have addressed the comments in v8 (attached). I've also now taken
out the check for PG17 on old cluster and now all clusters which have
replication origins are migrated.

regards,
Ajin Cherian
Fujitsu Australia

Attachment Content-Type Size
v8-0001-Preserve-subscription-OIDs-during-pg_upgrade.patch application/octet-stream 9.1 KB
v8-0002-Preserve-replication-origin-OIDs-during-pg_upgrad.patch application/octet-stream 28.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tender Wang 2026-06-17 09:56:24 Re: btree_gist: Fix NaN handling in float4 and float8 opclasses
Previous Message Lucas DRAESCHER 2026-06-17 09:09:46 Re: [Bug Report + Patch] File descriptor leak when io_method=io_uring