| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | Alena Vinter <dlaaren8(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Startup PANIC on standby promotion due to zero-filled WAL segment |
| Date: | 2025-12-24 05:55:58 |
| Message-ID: | aUuAbs_j2ifwvkky@paquier.xyz |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Dec 23, 2025 at 08:49:20PM +0700, Alena Vinter wrote:
> Michael, I left my pipeline running the TAP test until it failed — and
> after some time, it did fail. I then changed the test slightly, and simply
> by adding a short sleep, I was able to reproduce the same failure more
> reliably. Moreover, attempting to restart the standby server after a failed
> promotion triggers startup PANIC again.
This is a better argument, yes. ProcessInterrupts() is just a way to
force the WAL receiver to do nothing. We could see the same if a WAL
receiver fails a palloc() or an allocation repeatedly, shutting it
down before it is able to stream any changes, and we could also have a
test with an injection point that forces an error based on a specific
specific timeline number, or something like that.
Hmm. Like in the case where the WAL receiver is not able to connect
to a primary, shouldn't we prevent the promotion request to process at
all? So while you have your finger on something here, I don't think
that your suggested solution is a good nor correct one: it sounds to
me that the startup process assumes that the WAL receiver is doing
some work, then the promotion request comes it and we assume that it
is OK to process through the promotion while we should obviously not
do so, because the WAL receiver has streamed zero contents from TLI 2.
It sounds to me that we should let the startup process know that
something is wrong with the WAL receiver, meaning that it may be up to
the WAL receiver to save some information in shared memory so as the
startup process should not allow the promotion to go through at all.
--
Michael
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2025-12-24 06:15:22 | Re: Switch buffile.c/h to use pgoff_t |
| Previous Message | Zizhen Qiao | 2025-12-24 05:34:22 | correct a word |