Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Noah Misch <noah(at)leadboat(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-07-10 20:00:12
Message-ID: 20230710200012.GB3476234@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, May 15, 2023 at 03:38:17PM +1200, Thomas Munro wrote:
> On Fri, May 12, 2023 at 6:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > 2023-05-11 20:19:22.248 MSK [2037134] FATAL: invalid memory alloc request size 2021163525
> > 2023-05-11 20:19:22.248 MSK [2037114] LOG: startup process (PID 2037134) exited with exit code 1
>
> Thanks Alexander. Looking into this. I think it is probably
> something like: recycled standby pages are not zeroed (something we
> already needed to do something about[1]), and when we read a recycled
> garbage size (like your "xxxx") at the end of a page at an offset
> where we don't have a full record header on one page, we skip the
> ValidXLogRecordHeader() call (and always did), but the check in
> allocate_recordbuf() which previously handled that "gracefully" (well,
> it would try to allocate up to 1GB bogusly, but it wouldn't try to
> allocate more than that and ereport) is a bit too late. I probably
> need to add an earlier not-too-big validation. Thinking.

I agree about an earlier not-too-big validation. Like the attached? I
haven't tested it with Alexander's recipe or pondered it thoroughly.

> [1] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg@alap3.anarazel.de

Regarding [1], is it still worth zeroing recycled pages on standbys and/or
reading the whole header before allocating xl_tot_len? (Are there benefits
other than avoiding a 1G backend allocation or 4G frontend allocation, or is
that benefit worth the cycles?)

Attachment Content-Type Size
xl_tot_len-validate-v1.patch text/plain 5.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-07-10 22:32:01 BUG #18018: Homebrew link is broken
Previous Message Andres Freund 2023-07-10 19:51:07 Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()