Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-08-15 02:04:49
Message-ID: ZNrdQSWe1flOfHN3@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Aug 14, 2023 at 10:28:54PM +1200, Thomas Munro wrote:
> Something like this... patches for 12 and master attached. Needs a
> lot more testing. I really want to figure out how to get
> deterministic tests...

Interesting, thanks for the quick patch!

/*
* Try to find space to decode this record, if we can do so without
* calling palloc. If we can't, we'll try again below after we've
* validated that total_len isn't garbage bytes from a recycled WAL page.
*/
decoded = XLogReadRecordAlloc(state,
total_len,
false /* allow_oversized */ );

The patch relies on total_len before the header validation is
completed, meaning that this value of total_len could be invalid
because it comes from the partially-read header.. Oh wait, that's
actually OK because an oversized allocation is not allowed yet and the
decode buffer would not be able to store more than what it can?
Perhaps this comment should be updated to tell something like, adding
that total_len can be garbage, but we don't care because we don't
allocate anything that the decode buffer cannot hold.

#ifndef FRONTEND

- /*
- * Note that in much unlucky circumstances, the random data read from a
- * recycled segment can cause this routine to be called with a size
- * causing a hard failure at allocation. For a standby, this would cause
- * the instance to stop suddenly with a hard failure, preventing it to
- * retry fetching WAL from one of its sources which could allow it to move
- * on with replay without a manual restart. If the data comes from a past
- * recycled segment and is still valid, then the allocation may succeed
- * but record checks are going to fail so this would be short-lived. If
- * the allocation fails because of a memory shortage, then this is not a
- * hard failure either per the guarantee given by MCXT_ALLOC_NO_OOM.
- */
if (!AllocSizeIsValid(newSize))
return false;

Wouldn't it be OK to drop entirely this check? I'm fine to keep it as
a safety measure, but it should not be necessary now that it gets
called only once the header is validated?

Regarding the tests, I have been using this formula to produce the
number of bytes until the next page boundary:
select setting::int - ((pg_current_wal_lsn() - '0/0') % setting::int)
from pg_settings where name = 'wal_block_size';

Using pg_logical_emit_message() non-transactional with an empty set of
strings generates records of 56 bytes, so I was thinking about the
following:
- Generate a bunch of small records with pg_logical_emit_message(), or
records based on the threshold with the previous formula.
- Loop until we reach a page limit, at 24 bytes (?).
- Generate one last record to cut through.
- Stop the node in immediate mode.
- Write some garbage bytes on the last page generated to emulate the
recycled contents and an allocation
- Start the node, which should be able to startup.
With wal_level = minimal, autovacuum = off and a large
checkpoint_timeout, any other records are minimized. That's fancy,
though.

Do you think that this could work reliably?
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2023-08-15 02:36:10 Re: BUG #17928: Standby fails to decode WAL on termination of primary
Previous Message ocean_li_996 2023-08-14 17:03:52 Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()