Re: Assert in pageinspect with NULL pages

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Daria Lepikhova <d(dot)lepikhova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assert in pageinspect with NULL pages
Date: 2022-03-27 16:21:42
Message-ID: CAEze2WgaJD14Z_Pd3NJ89qxgUHbU69QDhex_syH+W92VLZm2=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 27 Mar 2022 at 16:24, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Mar 25, 2022 at 12:57 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > On Thu, Mar 24, 2022 at 08:54:14PM -0700, Peter Geoghegan wrote:
> > > amcheck's palloc_btree_page() function validates that an 8KiB page is
> > > in fact an nbtree page, in a maximally paranoid way. Might be an
> > > example worth following here.
> >
> > Sure, we could do that. However, I don't think that going down to
> > that is something we have any need for in pageinspect, as the module
> > is also useful to look at the contents of the full page, even if
> > slightly corrupted, and too many checks would prevent a lookup at the
> > internal contents of a page.
>
> It's my impression that there are many ways of crashing the system
> using pageinspect right now. We aren't very careful about making sure
> that our code that reads from disk doesn't crash if the data on disk
> is corrupted, and all of that systemic weakness is inherited by
> pageinspect. But, with pageinspect, you can supply your own pages, not
> just use the ones that actually exist on disk.
>
> Fixing this seems like a lot of work and problematic for various
> reasons. And I do agree that if we can allow people to look at what's
> going on with a corrupt page, that's useful. On the other hand, if by
> "looking" they crash the system, that sucks a lot. So overall I think
> we need a lot more sanity checks here.

I noticed this thread due to recent commit 291e517a, and noticed that
it has some overlap with one of my patches [0], in which I fix the
corresponding issue in core postgres by assuming (and in
assert-enabled builds, verifying) the size and location of the special
area. As such, you might be interested in that patch.

Note that currently in core postgres a corrupted special area pointer
will likely target neighbouring blocks in the buffer pool; resulting
in spreading corruption when the special area is updated. This
spreading corruption should be limited to only the corrupted block
with my patch.

Kind regards,

Matthias van de Meent

[0] https://commitfest.postgresql.org/37/3543/
https://www.postgresql.org/message-id/CAEze2WjE3+tGO9Fs9+iZMU+z6mMZKo54W1Zt98WKqbEUHbHOBg@mail.gmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-03-27 16:31:23 Re: Why is lorikeet so unstable in v14 branch only?
Previous Message Jaime Casanova 2022-03-27 16:07:27 Re: [WIP] Allow pg_upgrade to copy segments of the same relfilenode in parallel