Re: Incorrect handling of OOM in WAL replay leading to data loss

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, ethmertz(at)amazon(dot)com, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Incorrect handling of OOM in WAL replay leading to data loss
Date: 2023-08-08 05:52:42
Message-ID: ZNHYKkKpkJ/YBj5a@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 01, 2023 at 04:39:54PM -0700, Jeff Davis wrote:
> On Tue, 2023-08-01 at 16:14 +0300, Aleksander Alekseev wrote:
> > Probably I'm missing something, but if memory allocation is required
> > during WAL replay and it fails, wouldn't it be a better solution to
> > log the error and terminate the DBMS immediately?
>
> We need to differentiate between:
>
> 1. No valid record exists and it must be the end of WAL; LOG and start
> up.
>
> 2. A valid record exists and we are unable to process it (e.g. due to
> OOM); PANIC.

Yes, still there is a bit more to it. The origin of the introduction
to palloc(MCXT_ALLOC_NO_OOM) partially comes from this thread, that
has reported a problem where we switched from malloc() to palloc()
when xlogreader.c got introduced:
https://www.postgresql.org/message-id/CAHGQGwE46cJC4rJGv+kVMV8g6BxHm9dBR_7_QdPjvJUqdt7m=Q@mail.gmail.com

And the malloc() behavior when replaying WAL records is even older
than that.

At the end, we want to be able to give more options to anybody looking
at WAL records, and let them take decisions based on the error reached
and the state of the system. For example, it does not make much sense
to fail hard on OOM if replaying records when in standby mode because
we can just loop again. The same can actually be said when in crash
recovery. On OOM, the startup process considers that we have an
invalid record now, which is incorrect. We could fail hard and FATAL
to replay again (sounds like the natural option), or we could loop
over the record that failed its allocation, repeating things. In any
case, we need to give more information back to the system so as it can
take better decisions on what it should do.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2023-08-08 06:10:45 Report planning memory in EXPLAIN ANALYZE
Previous Message Drouvot, Bertrand 2023-08-08 05:41:24 Re: Synchronizing slots from primary to standby