Re: Incorrect handling of OOM in WAL replay leading to data loss

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, ethmertz(at)amazon(dot)com, nathandbossart(at)gmail(dot)com, pgsql(at)j-davis(dot)com, sawada(dot)mshk(at)gmail(dot)com
Subject: Re: Incorrect handling of OOM in WAL replay leading to data loss
Date: 2023-08-09 08:44:49
Message-ID: ZNNSAaKa5w6MNTDY@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 09, 2023 at 05:00:49PM +0900, Kyotaro Horiguchi wrote:
> Looks fine.

Okay, I've updated the patch in consequence. I'll look at 0001 again
at the beginning of next week.

> While it's a kind of bug in total, we encountered a case where an
> excessively large xl_tot_len actually came from a corrupted
> record. [1]

Right, I remember this one. I think that Thomas was pretty much right
that this could be caused because of a lack of zeroing in the WAL
pages.

> I'm glad to see this infrastructure comes in, and I'm on board with
> retrying due to an OOM. However, I think we really need official steps
> to wrap up recovery when there is a truly broken, oversized
> xl_tot_len.

There are a few options on the table, only doable once the WAL reader
provider the error state to the startup process:
1) Retry a few times and FATAL.
2) Just FATAL immediately and don't wait.
3) Retry and hope for the best that the host calms down.
I have not seeing this issue being much of an issue in the field, so
perhaps option 2 with the structure of 0002 and a FATAL when we catch
XLOG_READER_OOM in the switch would be enough. At least that's enough
for the cases we've seen. I'll think a bit more about it, as well.

Yeah, agreed. That's orthogonal to the issue reported by Ethan,
unfortunately, where he was able to trigger the issue of this thread
by manipulating the sizing of a host after producing a record larger
than what the host could afford after the resizing :/
--
Michael

Attachment Content-Type Size
v3-0001-Add-infrastructure-to-report-error-codes-in-WAL-r.patch text/x-diff 41.9 KB
v3-0002-Make-WAL-replay-more-robust-on-OOM-failures.patch text/x-diff 4.4 KB
v3-0003-Tweak-to-force-OOM-behavior-when-replaying-record.patch text/x-diff 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2023-08-09 09:06:24 Re: pgbnech: allow to cancel queries during benchmark
Previous Message David Rowley 2023-08-09 08:44:43 Re: Avoid stack frame setup in performance critical routines using tail calls