Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Noah Misch <noah(at)leadboat(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: thomas(dot)munro(at)gmail(dot)com, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-07-17 00:49:05
Message-ID: 20230717004905.GE3826297@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jul 11, 2023 at 03:54:14PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 10 Jul 2023 13:00:12 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in
> > On Mon, May 15, 2023 at 03:38:17PM +1200, Thomas Munro wrote:
> > > On Fri, May 12, 2023 at 6:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > > > 2023-05-11 20:19:22.248 MSK [2037134] FATAL: invalid memory alloc request size 2021163525
> > > > 2023-05-11 20:19:22.248 MSK [2037114] LOG: startup process (PID 2037134) exited with exit code 1

On my system, testing unpatched REL_15_2, the test reached its first failure
at minute 53 and its second failure at minute 189. Perhaps a faster,
deterministic test would be feasible:

- Use pg_logical_emit_message to fill a few segments with 0xFF.
- CHECKPOINT the primary, so the standby recycles segments.
- One more pg_logical_emit_message, computing the length from
pg_current_wal_insert_lsn() such that new message crosses a segment boundary
and ends 4 bytes before the end of a page.
- Stop the primary.
- If the bug is present, the standby will exit.

> > > Thanks Alexander. Looking into this. I think it is probably
> > > something like: recycled standby pages are not zeroed (something we
> > > already needed to do something about[1]), and when we read a recycled
> > > garbage size (like your "xxxx") at the end of a page at an offset
> > > where we don't have a full record header on one page, we skip the
> > > ValidXLogRecordHeader() call (and always did), but the check in
> > > allocate_recordbuf() which previously handled that "gracefully" (well,
> > > it would try to allocate up to 1GB bogusly, but it wouldn't try to
> > > allocate more than that and ereport) is a bit too late. I probably
> > > need to add an earlier not-too-big validation. Thinking.
> >
> > I agree about an earlier not-too-big validation. Like the attached? I
> > haven't tested it with Alexander's recipe or pondered it thoroughly.
>
> I like the patch for its code clean up, but I'm afraid that it removes
> the existing record length check when reading continued
> pages.

The removed check would be repetitive in all cases known to me, but I wouldn't
mind keeping the check instead.

> However, I'm unsure about merely adding a check for too-long
> records, due to the potential for requesting an excessively large
> amount of memory, even if it will be released shortly.
>
> > > [1] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg@alap3.anarazel.de
> >
> > Regarding [1], is it still worth zeroing recycled pages on standbys and/or
> > reading the whole header before allocating xl_tot_len? (Are there benefits
> > other than avoiding a 1G backend allocation or 4G frontend allocation, or is
> > that benefit worth the cycles?)
>
> I believe reading the whole header is the most sensible approach as it
> can prevent unnecessary memory requests. Another potential solution
> (or hack) for this specific case is to let XLogWalRcvFlush write a
> finalizing ((uint32)0) when dying is true. This stabilizes the
> behavior to "invalid record length.. got 0" when running the TAP test.

> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -989,6 +989,15 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
> {
> Assert(tli != 0);
>
> + if (dying)
> + {
> + int buflen = sizeof(uint32);
> + char buf[sizeof(uint32)];
> +
> + memset(buf, 0, buflen);
> + XLogWalRcvWrite(buf, buflen, LogstreamResult.Write, tli);
> + }
> +
> if (LogstreamResult.Flush < LogstreamResult.Write)
> {
> WalRcvData *walrcv = WalRcv;

That's inexpensive in cycles, and it should make the failures in question
quite rare. If we do that instead of pre-zeroing, I think we'd still want to
be ready for invalid lengths, for the case of unclean standby shutdown.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Paul De Audney 2023-07-17 02:25:05 Query returns error "there is no parameter $1" but server logs that there are two parameters supplied
Previous Message Thomas Munro 2023-07-16 22:04:29 Re: BUG #17949: Adding an index introduces serialisation anomalies.