| From: | Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com> |
|---|---|
| To: | "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
| Subject: | After upgrade from Pg11.2 to 17.7 logical replication prevents database instance shutdown |
| Date: | 2025-12-01 13:39:38 |
| Message-ID: | CAODqTUZXgywhJXGK1UmaWtJDVuzXXUYG4-DTCuV0VkB--+SCWA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Hello,
We have recently upgraded from PostgreSQL 11.2 to PostgreSQL 17.7. We have
logical replication between two database instances; no third-party CDC
consumers are used.
During low traffic on the publisher database, there are no issues, and the
publisher instance shutdown is smooth, as expected.
If we request a shutdown in a condition where there is a replication lag
from the publisher to the subscriber instance (systemctl stop .... which is
defined in the systems unit as
ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast %i stop
) the shutdown hangs for exactly 30 minutes from the "received fast
shutdown request" message in the database log with log message (
... 0 5029/2736 sub_xxx_usd START_REPLICATION [57P01]:FATAL: terminating
connection due to administrator command
).
We have checked the corresponding logs from PG 11.2, it took exactly 60
seconds.
We have also tried setting checkpoint_timeout = 27min and archive_timeout =
23min to make sure the delayed shutdown is not related to these parameters,
and still the shutdown is blocked just for 30 minutes.
If we disable the subscription, the shutdown is smooth; that is why we
suspect some change in logical replication, or there are some new
configuration parameters we have missed to let publisher instance shutdown
cleanly without that long delay, and finally terminating the sender process
on the publisher instance.
PostgreSQL version:
PostgreSQL 17.7 (Ubuntu 17.7-3.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0, 64-bit
Timeouts:
publisher instance:
powa=# show wal_sender_timeout;
wal_sender_timeout
--------------------
10min
(1 row)
subscriber instance:
powa=# show wal_receiver_timeout;
wal_receiver_timeout
----------------------
10min
(1 row)
OS version:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy
We have found
https://github.com/postgres/postgres/commit/5231ed8262c94936a69bce41f64076630bbd99a2,
not sure whether it applies to the behavior change described above.
Also, the "walsender.c" comment seems to explain that the shutdown is
intentionally postponed (could be a very long time, in our case, the lag is
caused by ETLs and can be about 80GB, so postponing the shutdown after all
the lag costs a lot of time). And it does not explain to us the timeout
change from 60 seconds to 30 minutes (no timeout is mentioned):
* If the server is shut down, checkpointer sends us
* PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited. If
* the backend is idle or runs an SQL query this causes the backend to
* shutdown, if logical replication is in progress all existing WAL records
* are processed followed by a shutdown. Otherwise, this causes the walsender
* to switch to the "stopping" state. In this state, the walsender will
reject
* any further replication commands. The checkpointer begins the shutdown
* checkpoint once all walsenders are confirmed as stopping. When the
shutdown
* checkpoint finishes, the postmaster sends us SIGUSR2. This instructs
* walsender to send any outstanding WAL, including the shutdown checkpoint
* record, wait for it to be replicated to the standby, and then exit.
Our pipeline requires the instance restart, so far the only workaround we
have found is to explicitly disable subscription before initiating
shutdown, but it is considered a bit fragile compared to smooth behavior on
Pg11.
Is there a way how to make the 30-minute shutdown shorter to become closer
to pg11 behavior?
Thanks in advance
Ales Zeleny
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Vijay Reddy | 2025-12-01 20:06:56 | Fwd: ODBC_FDW not reading Foreign Tables |
| Previous Message | Stuart Campbell | 2025-12-01 11:02:00 | Re: Check whether a NOT NULL check constraint has been validated |