| From: | Ajin Cherian <itsajin(at)gmail(dot)com> |
|---|---|
| To: | Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: [PATCH] Preserve replication origin OIDs in pg_upgrade |
| Date: | 2026-07-01 02:49:49 |
| Message-ID: | CAFPTHDYfDHQeQshXi7GDOQz3N8YdY6SNSmpoSAbnaQPBkVfv6A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jun 24, 2026 at 2:07 AM Rui Zhao <zhaorui126(at)gmail(dot)com> wrote:
>
> Hi Ajin,
>
> Thanks for v9, and for addressing both my earlier comments. I also went
> through the rest of v9 -- the other reviewers' comments and the
> corresponding fixes (including the replorigin_create() locking rework) look
> good to me; I didn't find any issues there.
>
> A follow-up on the version gating, though: I think it now moved a bit too
> far. get_replication_origin_info() is called unconditionally (outside the
> >= 1700 block):
>
> /* Get replication origin information */
> get_replication_origin_info(&old_cluster);
>
> and it runs "SELECT count(*) FROM pg_catalog.pg_replication_origin".
> pg_dumpall's dumpReplicationOrigins() queries the same catalog with no
> version guard either. But pg_replication_origin only exists from 9.5
> (commit 5aa2350426), while pg_upgrade still accepts old clusters back to
> 9.2:
>
> if (GET_MAJOR_VERSION(old_cluster.major_version) < 902)
> pg_fatal("This utility can only upgrade from PostgreSQL version %s
> and later.", "9.2");
>
> So upgrading from 9.2/9.3/9.4 now fails: the query hits a non-existent
> catalog and pg_fatal()s (and dumpReplicationOrigins would too).
> It should be gated on the version where origins were introduced, e.g.
>
> if (GET_MAJOR_VERSION(old_cluster.major_version) >= 905)
> get_replication_origin_info(&old_cluster);
>
> with the matching server_version >= 90500 check on the pg_dumpall side.
> That keeps the < 17 case I flagged working while not breaking 9.2-9.4
> (which have no origins to migrate anyway).
>
Thanks, I have added both the flags.
> One nit while here: there is an extra blank line after the
> check_new_cluster_replication_origins() call in check_new_cluster().
>
Removed.
On Wed, Jun 24, 2026 at 3:09 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Jun 23, 2026 at 5:20 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > All the above changes have been addressed in patch v9.
> >
>
> Thanks for the patches. A few ocmments:
>
> 1)
>
> + * In binary-upgrade mode, skip origin creation here. This is required to
> + * preserve the roident from the old cluster for this subscription.
>
> subscription --> subscription's origin
>
Changed.
> 2)
> replorigin_create_with_id():
>
> + if (SearchSysCacheExists1(REPLORIGNAME, roname_d))
> + ereport(ERROR,
> + errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("replication origin \"%s\" already exists", roname));
>
> Can this ever happen? IIUC, since new-cluster should not have any
> origin (as per the check introduced) and old cluster can not have
> duplicate names, we should not hit this error. Do we want to keep it
> as sanity check? If so, can we put a comment atop it.
>
I've removed it.
> 3)
> replorigin_create_with_id:
>
> + if (remote_lsn != InvalidXLogRecPtr)
> + replorigin_advance(roident, remote_lsn, InvalidXLogRecPtr,
> + false /* backward */,
> + false /* WAL log */);
>
> We do not want to call 'advance' from regular 'replorigin_create'
> flow. It is only for binary-Upgrade flow. Do you think we should have
> a sanity check here?
>
> if (remote_lsn != InvalidXLogRecPtr)
> {
> Assert(IsBinaryUpgrade)
> replorigin_advance(..)
> }
>
Done.
> 4)
> replorigin_create:
>
> + found = true;
>
> + if (!found)
> ereport(ERROR,
> (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> errmsg("could not find free replication origin ID")));
>
> Do you think we can make use of 'collides' itself here instead of
> adding a new 'found' variable. We can move 'collides' outside of loop
> and use it to emit above error.
>
I've removed found and instead checked if the roident has reached the max limit
> 5)
> binary_upgrade_create_replication_origin:
>
> + node = (ReplOriginId) node_oid;
> +
> + if (SearchSysCacheExists1(REPLORIGIDENT, ObjectIdGetDatum(node)))
> + ereport(ERROR,
> + errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("replication origin with ID %u already exists",
> + (Oid) node));
>
> How can we hit this? Same comment as #2.
>
Removed
> 6)
> -check_new_cluster_subscription_configuration(void)
> +check_new_cluster_replication_origins(void)
>
> - /* Quick return if there are no subscriptions to be migrated. */
> - if (old_cluster.nsubs == 0)
> - return;
>
> - if (old_cluster.nsubs > max_active_replication_origins)
>
> With these changes, do we even need to capture old-cluster's 'nsubs'
> in get_subscription_info(). I do not see any other usage of nsubs. Let
> me know if I am missing any case.
I've removed nsubs
On Wed, Jun 24, 2026 at 3:21 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Jun 24, 2026 at 10:39 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> >
> > 2)
> > replorigin_create_with_id():
> >
> > + if (SearchSysCacheExists1(REPLORIGNAME, roname_d))
> > + ereport(ERROR,
> > + errcode(ERRCODE_DUPLICATE_OBJECT),
> > + errmsg("replication origin \"%s\" already exists", roname));
> >
> > Can this ever happen? IIUC, since new-cluster should not have any
> > origin (as per the check introduced) and old cluster can not have
> > duplicate names, we should not hit this error. Do we want to keep it
> > as sanity check? If so, can we put a comment atop it.
> >
>
> Analyzed a bit more, with this change in patch002, we see an error change:
>
> Without patch:
> postgres=# SELECT pg_replication_origin_create('abcd');
> pg_replication_origin_create
> ------------------------------
> 1
> postgres=# SELECT pg_replication_origin_create('abcd');
> ERROR: duplicate key value violates unique constraint
> "pg_replication_origin_roname_index"
> DETAIL: Key (roname)=(abcd) already exists.
>
> ----
>
> With patch:
> postgres=# SELECT pg_replication_origin_create('abcd');
> pg_replication_origin_create
> ------------------------------
> 1
>
> postgres=# SELECT pg_replication_origin_create('abcd');
> ERROR: replication origin "abcd" already exists
>
>
> So without the new check also, we can prevent duplicate name insertion
> (even if that happens during upgrade, which I can not see how). The
> new error and the check is not wrong, but slightly redundant. Or let
> me know if I am missing something.
>
I've removed that check.
On Wed, Jun 24, 2026 at 11:27 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> Thanks for the updated patch. I have reviewed the patch and here are
> some my comments:
>
> 1. Since we are quering only one column, do we need to use 'i_norigins'?
>
> +void
> +get_replication_origin_info(ClusterInfo *cluster)
> +{
> + PGconn *conn;
> + PGresult *res;
> + int i_norigins;
> +
> + conn = connectToServer(cluster, "template1");
> + res = executeQueryOrDie(conn, "SELECT count(*) AS norigins "
> + "FROM pg_catalog.pg_replication_origin");
> + i_norigins = PQfnumber(res, "norigins");
> +
> + cluster->nrepl_origins = atoi(PQgetvalue(res, 0, i_norigins));
> + PQclear(res);
> +
> + PQfinish(conn);
> +}
> I think we can directly use:
> cluster->nrepl_origins = atoi(PQgetvalue(res, 0, 0));
>
> Thoughts?
Removed it.
>
> 2. In the above code should we also add a check like:
> if (PQntuples(res) != 1)
> pg_fatal("could not count the number of replication origins");
>
Added.
> 3. The brackets are not required:
> + if (!tablespaces_only && !roles_only && binary_upgrade)
> + {
> + /* Dump replication origins */
> + dumpReplicationOrigins(conn);
> }
> We can move the comment '/* Dump replication origins */' above if condition.
> It will make it consistent with the existing code.
>
I've moved the comment up but keeping the brackets as it has more than
one line, that is the recommendation from Tom Lane.
> 4. I ran the tests locally and found that the test_decoding test suit
> is failing:
> not ok 10 - replorigin 249 ms
>
> diff:
> -- ensure duplicate creations fail
> SELECT pg_replication_origin_create('regress_test_decoding: regression_slot');
> -ERROR: duplicate key value violates unique constraint
> "pg_replication_origin_roname_index"
> -DETAIL: Key (roname)=(regress_test_decoding: regression_slot) already exists.
> +ERROR: replication origin "regress_test_decoding: regression_slot"
> already exists
> -- ensure inactive origin cannot be set as session one if pid is specified
> SELECT pg_replication_origin_session_setup('regress_test_decoding:
> regression_slot', -1);
> ERROR: cannot use PID -1 for inactive replication origin with ID 1
>
> I think this is expected because this patch adds a new function
> 'replorigin_create_with_id'
> which adds a check for the already existing replication origin.
> In HEAD we don't have such a check. So I think we should update the
> expected output file.
>
Based on feedback from Shveta, I've removed that new error as changing
existing behaviour isn't required.
On Thu, Jun 25, 2026 at 2:45 AM Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> wrote:
>
> Hello
>
> +# Verify that the subscription related replication origins are
> preserved after upgrade.
> +my $post_roident_rows = $new_sub->safe_psql('postgres',
> + "SELECT s.subname, o.roident
> + FROM pg_subscription s
> + JOIN pg_replication_origin o ON o.roname = 'pg_' || s.oid::text
> + ORDER BY s.subname"
> +);
> +for my $row (split /\n/, $post_roident_rows)
> +{
> + my ($subname, $roident) = split /\|/, $row;
> + is($roident, $pre_upgrade_roident{$subname},
> + "roident preserved for subscription '$subname' after upgrade");
> +}
> +
>
> Won't this check miss (and silently pass) if an entry is missing
> post-upgrade? It verifies that all entries that exists match the pre
> upgrade entries, but it doesn't verify that the list is the same.
Modified this to iterate over pre upgrade and post upgrade
subscriptions making sure both are present.
All the above changes are updated in patch v10.
regards,
Ajin Cherian
Fujitsu Australia
| Attachment | Content-Type | Size |
|---|---|---|
| v10-0001-Preserve-subscription-OIDs-during-pg_upgrade.patch | application/octet-stream | 9.1 KB |
| v10-0002-Preserve-replication-origin-OIDs-during-pg_upgra.patch | application/octet-stream | 33.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-07-01 03:18:23 | Re: Refactor function pg_get_multixact_stats (src/backend/utils/adt/multixactfuncs.c) |
| Previous Message | Peter Smith | 2026-07-01 02:46:39 | Re: Include sequences in publications created by pg_createsubscriber |