Quick Links

Re: Incorrect handling of OOM in WAL replay leading to data loss

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, ethmertz(at)amazon(dot)com, nathandbossart(at)gmail(dot)com, pgsql(at)j-davis(dot)com, sawada(dot)mshk(at)gmail(dot)com
Subject:	Re: Incorrect handling of OOM in WAL replay leading to data loss
Date:	2023-08-09 06:03:21
Message-ID:	ZNMsKS5rfn36t0Bi@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Aug 08, 2023 at 05:44:03PM +0900, Kyotaro Horiguchi wrote:
> I like the overall direction. Though, I'm considering enclosing the
> errormsg and errorcode in a struct.

Yes, this suggestion makes sense as it simplifies all the WAL routines
that need to report back a complete error state, and there are four of
them now:
XLogPrefetcherReadRecord()
XLogReadRecord()
XLogNextRecord()
DecodeXLogRecord()

I have spent more time on 0001, polishing it and fixing a few bugs
that I have found while reviewing the whole. Most of them were
related to mistakes in resetting the error state when expected. I
have also expanded DecodeXLogRecord() to use an error structure
instead of only an errmsg, giving more consistency. The error state
now relies on two structures:
+typedef enum XLogReaderErrorCode
+{
+ XLOG_READER_NONE = 0,
+ XLOG_READER_OOM, /* out-of-memory */
+ XLOG_READER_INVALID_DATA, /* record data */
+} XLogReaderErrorCode;
+typedef struct XLogReaderError
+{
+ /* Buffer to hold error message */
+ char *message;
+ bool message_deferred;
+ /* Error code when filling *message */
+ XLogReaderErrorCode code;
+} XLogReaderError;

I'm kind of happy with this layer, now.

I have also spent some time on finding a more elegant solution for the
WAL replay, relying on the new facility from 0001. And it happens
that it is easy enough to loop if facing an out-of-memory failure when
reading a record when we are in crash recovery, as the state is
actually close to what a standby does. The trick is that we should
not change the state and avoid tracking a continuation record. This
is done in 0002, making replay more robust. With the addition of the
error injection tweak in 0003, I am able to finish recovery while the
startup process loops if under memory pressure. As mentioned
previously, there are more code paths to consider, but that's a start
to fix the data loss problems.

Comments are welcome.
--
Michael

Attachment	Content-Type	Size
v2-0001-Add-infrastructure-to-report-error-codes-in-WAL-r.patch	text/x-diff	41.9 KB
v2-0002-Make-WAL-replay-more-robust-on-OOM-failures.patch	text/x-diff	4.4 KB
v2-0003-Tweak-to-force-OOM-behavior-when-replaying-record.patch	text/x-diff	1.5 KB

In response to

Re: Incorrect handling of OOM in WAL replay leading to data loss at 2023-08-08 08:44:03 from Kyotaro Horiguchi

Responses

Re: Incorrect handling of OOM in WAL replay leading to data loss at 2023-08-09 07:13:53 from Kyotaro Horiguchi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2023-08-09 06:18:06	Re: Adding a LogicalRepWorker type field
Previous Message	Peter Smith	2023-08-09 05:29:31	Re: Handle infinite recursion in logical replication setup