Re: Fail hard if xlogreader.c fails on out-of-memory

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fail hard if xlogreader.c fails on out-of-memory
Date: 2023-09-26 23:14:15
Message-ID: ZRNlx1__idb_R_-S@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 27, 2023 at 11:06:37AM +1300, Thomas Munro wrote:
> I don't have an opinion yet on your other thread about making this
> stuff configurable for replicas, but for the simple crash recovery
> case shown here, hard failure makes sense to me.

Also, if we conclude that we're OK with just failing hard all the time
for crash recovery and archive recovery on OOM, the other patch is not
really required. That would be disruptive for standbys in some cases,
still perhaps OK in the long-term. I am wondering if people have lost
data because of this problem on production systems, actually.. It
would not be possible to know that it happened until you see a page on
disk that has a somewhat valid LSN, still an LSN older than the
position currently being inserted, and that could show up in various
forms. Even that could get hidden quickly if WAL is written at a fast
pace after a crash recovery. A standby promotion at an LSN older
would be unlikely as monitoring solutions discard standbys lagging
behind N bytes.

> *A more detailed analysis would talk about sectors (page header is
> atomic), and consider whether we're only trying to defend ourselves
> against recycled pages written by PostgreSQL (yes), arbitrary random
> data (no, but it's probably still pretty good) or someone trying to
> trick us (no, and we don't stand a chance).

WAL would not be the only part of the system that would get borked if
arbitrary bytes can be inserted into what's read from disk, random or
not.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-09-26 23:33:03 Re: pg_rewind with cascade standby doesn't work well
Previous Message Jeff Davis 2023-09-26 23:13:32 Re: Is this a problem in GenericXLogFinish()?