Re: recovery modules

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: recovery modules
Date: 2023-03-15 04:13:09
Message-ID: 20230315041309.GA596995@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I noticed that the new TAP test for basic_archive was failing
intermittently for cfbot. It looks like the query for checking that the
post-backup WAL is restored sometimes executes before archive recovery is
complete (because hot_standby is on). To fix this, I adjusted the test to
use poll_query_until instead. There are no other changes in v14.

I first tried to set hot_standby to off on the restored node so that the
query wouldn't run until archive recovery completed. This seemed like it
would work because start() useѕ "pg_ctl --wait", which has the following
note in the docs:

Startup is considered complete when the PID file indicates that the
server is ready to accept connections.

However, that's not what happens when hot_standby is off. In that case,
the postmaster.pid file is updated with PM_STATUS_STANDBY once recovery
starts, which wait_for_postmaster_start() interprets as "ready." I see
this was reported before [0], but that discussion fizzled out. IIUC it was
done this way to avoid infinite waits when hot_standby is off and standby
mode is enabled. I could be missing something obvious, but that doesn't
seem necessary when hot_standby is off and recovery mode is enabled because
recovery should end at some point (never mind the halting problem). I'm
still digging into this and may spin off a new thread if I can conjure up a
proposal.

[0] https://postgr.es/m/CAMkU%3D1wrMqPggnEfszE-c3PPLmKgRK17_qr7tmxBECYEbyV-4Q%40mail.gmail.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v14-0001-Move-extra-code-out-of-the-Pre-PostRestoreComman.patch text/x-diff 2.1 KB
v14-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-.patch text/x-diff 4.5 KB
v14-0003-introduce-routine-for-checking-mutually-exclusiv.patch text/x-diff 2.9 KB
v14-0004-refactor-code-for-restoring-via-shell.patch text/x-diff 28.2 KB
v14-0005-rename-archive-modules.sgml-to-archive-and-resto.patch text/x-diff 1.8 KB
v14-0006-restructure-archive-modules-docs-in-preparation-.patch text/x-diff 11.5 KB
v14-0007-introduce-restore_library.patch text/x-diff 70.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-03-15 04:23:48 Re: psql \watch 2nd argument: iteration count
Previous Message Michael Paquier 2023-03-15 04:09:34 Re: psql \watch 2nd argument: iteration count