Re: pg_upgrade and logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: pg_upgrade and logical replication
Date: 2023-05-10 07:59:24
Message-ID: CAHut+PvYt0yVJ+eAg0AdTHyTmgoZVY9znk3uxe98jOytmeCwLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for the v5-0001 patch code.

======
General

1. ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 'X/Y'])

I was a bit confused by this relation 'state' mentioned in multiple
places. IIUC the pg_upgrade logic is going to reject anything with a
non-READY (not 'r') state anyhow, so what is the point of having all
the extra grammar/parse_subscription_options etc to handle setting the
state when only possible value must be 'r'?

~~~

2. state V relstate

I still feel code readbility suffers a bit by calling some fields/vars
a generic 'state' instead of the more descriptive 'relstate'. Maybe
it's just me.

Previously commented same (see [1]#3, #4, #5)

======
doc/src/sgml/ref/pgupgrade.sgml

3.
+ <para>
+ Fully preserve the logical subscription state if any. That includes
+ the underlying replication origin with their remote LSN and the list of
+ relations in each subscription so that replication can be simply
+ resumed if the subscriptions are reactivated.
+ </para>

I think the "if any" part is not necessary. If you remove those words,
then the rest of the sentence can be simplified.

SUGGESTION
Fully preserve the logical subscription state, which includes the
underlying replication origin's remote LSN, and the list of relations
in each subscription. This allows replication to simply resume when
the subscriptions are reactivated.

~~~

4.
+ <para>
+ If this option isn't used, it is up to the user to reactivate the
+ subscriptions in a suitable way; see the subscription part in <xref
+ linkend="pg-dump-notes"/> for more information.
+ </para>

The link still renders strangely as previously reported (see [1]#2b).

~~~

5.
+ <para>
+ If this option is used and any of the subscription on the old cluster
+ has an unknown <varname>remote_lsn</varname> (0/0), or has any relation
+ in a state different from <literal>r</literal> (ready), the
+ <application>pg_upgrade</application> run will error.
+ </para>

5a.
/subscription/subscriptions/

~

5b
"has any relation in a state different from r" --> "has any relation
with state other than r"

======
src/backend/commands/subscriptioncmds.c

6.
+ if (strlen(state_str) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid relation state: %s", state_str)));

Is this relation state validation overly simplistic, by only checking
for length 1? Shouldn't this just be asserting the relstate must be
'r'?

======
src/bin/pg_dump/pg_dump.c

7. getSubscriptionTables

+/*
+ * getSubscriptionTables
+ * get information about the given subscription's relations
+ */
+void
+getSubscriptionTables(Archive *fout)
+{
+ SubscriptionInfo *subinfo;
+ SubRelInfo *rels = NULL;
+ PQExpBuffer query;
+ PGresult *res;
+ int i_srsubid;
+ int i_srrelid;
+ int i_srsubstate;
+ int i_srsublsn;
+ int i_nrels;
+ int i,
+ cur_rel = 0,
+ ntups,
+ last_srsubid = InvalidOid;

Why some above are single int declarations and some are compound int
declarations? Why not make them all consistent?

~

8.
+ appendPQExpBuffer(query, "SELECT srsubid, srrelid, srsubstate, srsublsn,"
+ " count(*) OVER (PARTITION BY srsubid) AS nrels"
+ " FROM pg_subscription_rel"
+ " ORDER BY srsubid");

Should this SQL be schema-qualified like pg_catalog.pg_subscription_rel?

~

9.
+ for (i = 0; i < ntups; i++)
+ {
+ int cur_srsubid = atooid(PQgetvalue(res, i, i_srsubid));

Should 'cur_srsubid' be declared Oid to match the atooid?

~~~

10. getSubscriptions

+ if (PQgetisnull(res, i, i_suboriginremotelsn))
+ subinfo[i].suboriginremotelsn = NULL;
+ else
+ subinfo[i].suboriginremotelsn =
+ pg_strdup(PQgetvalue(res, i, i_suboriginremotelsn));
+
+ /*
+ * For now assume there's no relation associated with the
+ * subscription. Later code might update this field and allocate
+ * subrels as needed.
+ */
+ subinfo[i].nrels = 0;

The wording "For now assume there's no" kind of gives an ambiguous
interpretation for this comment. IMO it sounds like this is the
"current" logic but some future PG version may behave differently - I
don't think that is the intended meaning at all.

SUGGESTION.
Here we just initialize nrels to say there are 0 relations associated
with the subscription. If necessary, subsequent logic will update this
field and allocate the subrels.

~~~

11. dumpSubscription

+ for (i = 0; i < subinfo->nrels; i++)
+ {
+ appendPQExpBuffer(query, "\nALTER SUBSCRIPTION %s ADD TABLE "
+ "(relid = %u, state = '%c'",
+ qsubname,
+ subinfo->subrels[i].srrelid,
+ subinfo->subrels[i].srsubstate);
+
+ if (subinfo->subrels[i].srsublsn[0] != '\0')
+ appendPQExpBuffer(query, ", LSN = '%s'",
+ subinfo->subrels[i].srsublsn);
+
+ appendPQExpBufferStr(query, ");");
+ }

I previously asked ([1]#11) about how can this ALTER SUBSCRIPTION
TABLE code happen unless 'preserve_subscriptions' is true, and you
confirmed "It indirectly is, as in that case subinfo->nrels is
guaranteed to be 0. I just tried to keep the code simpler and avoid
too many nested conditions."

~

If you are worried about too many nested conditions then a simple
Assert(dopt->preserve_subscriptions); might be good to have here.

======
src/bin/pg_upgrade/check.c

12. check_and_dump_old_cluster

+ /* PG 10 introduced subscriptions. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1000 &&
+ user_opts.preserve_subscriptions)
+ {
+ check_for_subscription_state(&old_cluster);
+ }

12a.
All the other checks in this function seem to be in decreasing order
of PG version so maybe this check should be moved to follow that same
pattern.

~

12b.
Also won't it be better to give some error or notice of some kind if
the option/version are incompatible? I think this was mentioned in a
previous review.

e.g.

if (user_opts.preserve_subscriptions)
{
if (GET_MAJOR_VERSION(old_cluster.major_version) < 1000)
<pg_log or pg_fatal goes here...>;
check_for_subscription_state(&old_cluster);
}

~~~

13. check_for_subscription_state

+ for (int i = 0; i < ntup; i++)
+ {
+ is_error = true;
+ pg_log(PG_WARNING,
+ "\nWARNING: subscription \"%s\" has an invalid remote_lsn",
+ PQgetvalue(res, 0, 0));
+ }

13a.
This WARNING does not mention the database, but a similar warning
later about the non-ready state does mention the database. Probably
they should be consistent.

~

13b.
Something seems amiss. Here the is_error is assigned true; But later
when you test is_error that is for logging the ready-state problem.
Isn't there another missing pg_fatal for this invalid remote_lsn case?

======
src/bin/pg_upgrade/option.c

14. usage

+ printf(_(" --preserve-subscription-state preserve the subscription
state fully\n"));

Why say "fully"? How is "preserve the subscription state fully"
different to "preserve the subscription state" from the user's POV?

------
[1] My previous v4 code review -
https://www.postgresql.org/message-id/CAHut%2BPuThBY%3DMSYHRgUa6iv6tyCmnqU78itZ%2Bf4rMM2b124vqQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-05-10 08:08:32 Re: pg_upgrade and logical replication
Previous Message Drouvot, Bertrand 2023-05-10 06:39:08 Re: walsender performance regression due to logical decoding on standby changes