RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-26 04:25:34
Message-ID: TYAPR01MB586620409507691E8116EA62F5E2A@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Peter,

Thank you for reviewing! The patch can be available in [1].

> Here are my review comments for patch v25-0003.
>
> ======
> src/bin/pg_upgrade/check.c
>
> 1. GENERAL
>
> +static void check_for_confirmed_flush_lsn(void);
> +static void check_for_lost_slots(void);
>
> For more clarity, I wonder if it is better to rename some functions:
>
> check_for_confirmed_flush_lsn() -> check_old_cluster_for_confirmed_flush_lsn()
> check_for_lost_slots() -> check_old_cluster_for_lost_slots()

Replaced.

> 2.
> + /*
> + * Logical replication slots can be migrated since PG17. See comments atop
> + * get_logical_slot_infos().
> + */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
> + {
> + check_for_lost_slots();
> +
> + /*
> + * Do additional checks if a live check is not required. This requires
> + * that confirmed_flush_lsn of all the slots is the same as the latest
> + * checkpoint location, but it would be satisfied only when the server
> + * has been shut down.
> + */
> + if (!live_check)
> + check_for_confirmed_flush_lsn();
> + }
> +
>
> 2a.
> If my suggestions from v25-0002 [1] are adopted then this comment
> needs to change to say like "See atop file pg_upgrade.c..."
>
> 2b.
> Hmm. If my suggestions from v25-0002 [1] are adopted then the version
> checking and the slot counting would *already* be in this calling
> function. In that case, why can't this whole fragment be put in the
> same place? E.g. IIUC there is no reason to call these at checks all
> when the old_cluster slot count is already known to be 0. Similarly,
> there is no reason that both these functions need to be independently
> checking count_logical_slots again since we have already done that
> (again, assuming my suggestions from v25-0002 [1] are adopted).

Currently I did not accept the comment, so they were ignored.

> 3. check_for_lost_slots
>
> +/*
> + * Verify that all logical replication slots are usable.
> + */
> +void
> +check_for_lost_slots(void)
>
> This was forward-declared to be static, but the static function
> modifier is absent here.

Fixed.

> 4. check_for_lost_slots
>
> + /* Quick exit if the cluster does not have logical slots. */
> + if (count_logical_slots() == 0)
> + return;
> +
>
> AFAICT this quick exit can be removed. See my comment #2b.

2b was skipped, so IIUC this is still needed.

> 5. check_for_confirmed_flush_lsn
>
> +check_for_confirmed_flush_lsn(void)
> +{
> + int i,
> + ntups,
> + i_slotname;
> + PGresult *res;
> + DbInfo *active_db = &old_cluster.dbarr.dbs[0];
> + PGconn *conn;
> +
> + /* Quick exit if the cluster does not have logical slots. */
> + if (count_logical_slots() == 0)
> + return;
>
> AFAICT this quick exit can be removed. See my comment #2b.

I kept the style.

> .../t/003_logical_replication_slots.pl
>
> 6.
> +# 2. Temporarily disable the subscription
> +$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
> $old_publisher->stop;
>
> In my previous 0003 review ([2] #10b) I was not questioning the need
> for the $old_publisher->stop; before the pg_upgrade. I was only asking
> why it was done at this location (after the DISABLE) instead of
> earlier.

I see. The reason was to avoid unnecessary error by apply worker.

As the premise, the position of shutting down (before or after the DISABLE) does
not affect the result. But if it puts earlier than DISABLE, the apply worker will
exit with below error because the walsender exits ealier than worker:

```
ERROR: could not send end-of-streaming message to primary: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
no COPY in progress
```

It is not problematic but future readers may be confused if it find.
So I avoided it.

> 7.
> +# Check whether changes on the new publisher get replicated to the subscriber
> +$new_publisher->safe_psql('postgres',
> + "INSERT INTO tbl VALUES (generate_series(11, 20))");
> +$new_publisher->wait_for_catchup('sub');
> +$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
> +is($result, qq(20), 'check changes are shipped to the subscriber');
>
> /shipped/replicated/

You meant to say s/replicated/shipped/, right? Fixed.

[1]: https://www.postgresql.org/message-id/TYAPR01MB5866C6DE11EBC96752CEB7DEF5E2A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2023-08-26 04:28:35 Re: PostgreSQL 16 release announcement draft
Previous Message Hayato Kuroda (Fujitsu) 2023-08-26 04:23:46 RE: [PoC] pg_upgrade: allow to upgrade publisher node