BUG #15989: Cluster unable to open as hot standby after SIGKILL during exclusive backup

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: jlucasdba(at)gmail(dot)com
Subject: BUG #15989: Cluster unable to open as hot standby after SIGKILL during exclusive backup
Date: 2019-09-02 13:53:47
Message-ID: 15989-bf9dee713231e7b0@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15989
Logged by: James Lucas
Email address: jlucasdba(at)gmail(dot)com
PostgreSQL version: 11.5
Operating system: Centos 7
Description:

After being sent SIGKILL while exclusive backup is in effect, cluster
becomes unable to open as a hot standby. Reproducible on (at least) 11.5
and 9.6.15.

Steps to reproduce on a fresh database cluster:

* Initialize new cluster.
initdb -D data

* Start cluster.
pg_ctl -D data start

* Verify hot_standby is enabled (Should return "on" on 11.5. On 9.6.15,
need to enable it.)
psql -c 'show hot_standby'

* Verify wal_level is set to replica or logical (Okay on 11.5. On 9.6.15
need to set and restart.)
psql -c 'show wal_level'

* Stop cluster.
pg_ctl -D data stop

* Add recovery.conf
echo 'standby_mode = true' >> data/recovery.conf

* Start as hot standby
pg_ctl -D data start

* Validate cluster is running as hot standby (should return true)
psql -c 'select pg_is_in_recovery()'

* Stop cluster
pg_ctl -D data stop

* Remove recovery.conf
rm data/recovery.conf

* Start cluster normally
pg_ctl -D data start

* Enable exclusive backup
psql -c "select pg_start_backup('')"

* Find pid of main postgres process
ps -ef | grep 'postgres -D'

* Send SIGKILL to found pid
kill -s KILL <pid>

* Add recovery.conf
echo 'standby_mode = true' >> data/recovery.conf

* Attempt to start cluster
pg_ctl -D data start

At this point, the cluster fails to open. On 11.5 pg_ctl hangs waiting for
the database to open, and eventually times out. On 9.6.15, pg_ctl runs
normally, but tailing the database log shows that it never opens. It loops
at "starting up."

I've found that if you stop the instance and remove the recovery.conf, the
database actually will open normally. But even after that, if you go back
and try to open as a hot standby it will fail to open again. I have so far
not been able to find a way to let this cluster open as a hot standby again.
Affected instances had to be restored from backup.

This is particularly a problem when running postgres in Docker, as Docker
will send SIGKILL if database shutdown takes more than a few seconds.

Please let me know if any questions.

Thanks,
James Lucas

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Konstantin Knizhnik 2019-09-02 16:52:30 Memory leak in formatting.c
Previous Message easteregg 2019-09-02 09:28:07 Re: BUG #15984: order of where in() query affects query planer