Re: Fail hard if xlogreader.c fails on out-of-memory

From: Noah Misch <noah(at)leadboat(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fail hard if xlogreader.c fails on out-of-memory
Date: 2023-09-27 01:28:30
Message-ID: 20230927012830.GB364510@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 27, 2023 at 11:06:37AM +1300, Thomas Munro wrote:
> On Tue, Sep 26, 2023 at 8:38 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > Thoughts and/or comments are welcome.
>
> I don't have an opinion yet on your other thread about making this
> stuff configurable for replicas, but for the simple crash recovery
> case shown here, hard failure makes sense to me.

> Recycled pages can't fool us into making a huge allocation any more.
> If xl_tot_len implies more than one page but the next page's
> xlp_pageaddr is too low, then either the xl_tot_len you read was
> recycled garbage bits, or it was legitimate but the overwrite of the
> following page didn't make it to disk; either way, we don't have a
> record, so we have an end-of-wal condition. The xlp_rem_len check
> defends against the second page making it to disk while the first one
> still contains recycled garbage where the xl_tot_len should be*.
>
> What Michael wants to do now is remove the 2004-era assumption that
> malloc failure implies bogus data. It must be pretty unlikely in a 64
> bit world with overcommitted virtual memory, but a legitimate
> xl_tot_len can falsely end recovery and lose data, as reported from a
> production case analysed by his colleagues. In other words, we can
> actually distinguish between lack of resources and recycled bogus
> data, so why treat them the same?

Indeed. Hard failure is fine, and ENOMEM=end-of-WAL definitely isn't.

> *A more detailed analysis would talk about sectors (page header is
> atomic)

I think the page header is atomic on POSIX-compliant filesystems but not
atomic on ext4. That doesn't change the conclusion on $SUBJECT.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jacktby jacktby 2023-09-27 03:03:40 Re: Index AmInsert Parameter Confused?
Previous Message Peter Smith 2023-09-27 01:28:19 Re: Invalidate the subscription worker in cases where a user loses their superuser status