recovery_min_apply_delay in archive recovery causes assertion failure in latch

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: recovery_min_apply_delay in archive recovery causes assertion failure in latch
Date: 2019-09-29 15:49:03
Message-ID: CAHGQGwEyD6HdZLfdWc+95g=VQFPR4zQL4n+yHxQgGEGjaSVheQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I got the following assertion failure when I enabled recovery_min_apply_delay
and started archive recovery (i.e., I put only recovery.signal not
standby.signal).

TRAP: FailedAssertion("latch->owner_pid == MyProcPid", File:
"latch.c", Line: 522)

Here is the example to reproduce the issue:

----------------------------
initdb -D data
pg_ctl -D data start
psql -c "alter system set recovery_min_apply_delay to '60s'"
psql -c "alter system set archive_mode to on"
psql -c "alter system set archive_command to 'cp %p ../arch/%f'"
psql -c "alter system set restore_command to 'cp ../arch/%f %p'"
mkdir arch
pg_basebackup -D bkp -c fast
pgbench -i
pgbench -t 1000
pg_ctl -D data -m i stop
rm -rf bkp/pg_wal
mv data/pg_wal bkp
rm -rf data
mv bkp data
touch data/recovery.signal
pg_ctl -D data -W start
----------------------------

The latch that causes this assertion failure is recoveryWakeupLatch.
The ownership of this latch is taken only when standby mode is
requested. But this latch can be used when starting archive recovery
with recovery_min_apply_delay set even though it's unowned.
So the assertion failure happened.

Attached patch fixes this issue by making archive recovery always ignore
recovery_min_apply_delay. This change is OK because
recovery_min_apply_delay was introduced for standby mode, I think.

This issue is not new in v12. I observed that the issue was reproduced
in v11. So the back-patch is necessary.

Regards,

--
Fujii Masao

Attachment Content-Type Size
fix-assertion-failure-in-latch.patch application/octet-stream 494 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-09-29 16:27:28 Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)
Previous Message Alvaro Herrera 2019-09-29 15:27:09 Re: [DOC] Document concurrent index builds waiting on each other