RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-09-13 13:52:17
Message-ID: TYAPR01MB5866EF7398CB13FFDBF230E7F5F0A@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit,

Thank you for reviewing! Before making a patch I can reply the important point.

> 1. One thing to note is that if user checks whether the old cluster is
> upgradable with --check option and then try to upgrade, that will also
> fail. Because during the --check run there would at least one
> additional shutdown checkpoint WAL and then in the next run the slots
> position won't match. Note, I am saying this in context of using
> --check option with not-running old cluster. Won't that be surprising
> to users? One possibility is that we document such a behaviour and
> other is that we go back to WAL reading design where we can ignore
> known WAL records like shutdown checkpoint, XLOG_RUNNING_XACTS, etc.

Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that confirmed_flush is
behind the latest checkpoint lsn.

Good catch, we have never considered the case that --check is executed for
stopped cluster. You are right, the old cluster is turned on/off during the
check and it generates SHUTDOWN_CHECKPOINT. This leads that confirmed_flush is
behind the latest checkpoint lsn.

Here are other approaches we came up with:

1. adds WARNING message when the --check is executed and slots are checked.
We can say like:

```
...
Checking for valid logical replication slots
WARNING: this check generated WALs
Next pg_uprade would fail.
Please ensure again that all WALs are replicated.
...
```

2. adds hint message in the FATAL error when the confirmed_flush is not same as
the latest checkpoint:

```
...
Checking for valid logical replication slots fatal

Your installation contains invalid logical replication slots.
These slots can't be copied, so this cluster cannot be upgraded.
Consider removing such slots or consuming the pending WAL if any,
and then restart the upgrade.
If you did pg_upgrade --check before this run, it may be a cause.
Please start clusters and confirm again that all changes are
replicated.
A list of invalid logical replication slots is in the file:
```

3. requests users to do pg_upgrade --check on backup database, if old cluster
has logical slots. Basically they save a whole of cluster before doing pg_uprade,
so it may be acceptable. This is not a modification of codes.

How do others think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2023-09-13 14:06:28 Re: Add 'worker_type' to pg_stat_subscription
Previous Message Peter Eisentraut 2023-09-13 13:48:50 Re: [ psql - review request ] review request for \d+ tablename, \d+ indexname indenting