Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-09-22 04:02:42
Message-ID: ZQ0R4kz3gN6bqS1N@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Sep 22, 2023 at 03:11:54PM +1200, Thomas Munro wrote:
> A thought: commit 8fcb32db prevented us from logging messages that are
> too big to be decoded, but it wasn't back-patched.

Yes, there was one part done in ffd1b6bb6f8 that has required ABI
breakages, as well. There is an argument against a backpatch as a
set of records in a given range may fail to replay while they were
allowed before (I forgot the exact math, but I recall that this was
for records larger than XLogRecordMaxSize, still lower than the max
allocation mark.).

> I think that means
> that in older branches, there is a behaviour change unrelated to the
> "garbage bytes" problem discussed in this thread, and separate also
> from the out-of-memory problem. If someone generates a record too big
> to decode, say with pg_logical_emit_message(), we will fail
> differently. Before this patch set, we'd bogusly detect end-of-WAL,
> and after this patch we'd fail to palloc and recovery would bogusly
> fail. Which outcome is more bogus is hard to answer, and clearly we
> should prevent it upstream, but didn't for technical reasons. Do you
> agree that that is a separate topic that doesn't prevent us from
> committing this fix?

I don't see why it's a problem on HEAD: a startup process reacts the
same way for the end of WAL or an OOM. If we were to FATAL hard on
all stable branches for non-FRONTEND on OOM, which is something we'll
have to do anyway, then this patch set improves the situation because
we would fail in what I'd see is better in this case: with an OOM.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-09-22 06:00:01 BUG #18129: GiST index produces incorrect query results
Previous Message Thomas Munro 2023-09-22 03:11:54 Re: BUG #17928: Standby fails to decode WAL on termination of primary