Re: HOT chain validation in verify_heapam()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2023-03-22 21:56:22
Message-ID: 20230322215622.cegwxejsvrd2sail@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-03-22 13:45:52 -0700, Andres Freund wrote:
> On 2023-03-22 09:19:18 -0400, Robert Haas wrote:
> > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev
> > <aleksander(at)timescale(dot)com> wrote:
> > > The patch needed a rebase due to a4f23f9b. PFA v12.
> >
> > I have committed this after tidying up a bunch of things in the test
> > case file that I found too difficult to understand -- or in some cases
> > just incorrect, like:
>
> As noticed by Andrew
> https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and
> then reproduced on HEAD by me, there's something not right here.
>
> At the very least there's missing verification that tids actually exists in the
> "Update chain validation" loop, leading to:
> TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: 355, PID: 645093
>
> Which makes sense - the earlier loop adds t_ctid to the successor array, which
> we then query without checking if there still is such a tid on the page.

It's not quite so simple - I see now that the lp_valid check tries to prevent
that. However, it's not sufficient, because there is no guarantee that
lp_valid[nextoffnum] is initialized. Consider what happens if t_ctid of a heap
tuple points to beyond the end of the item array - lp_valid[nextoffnum] won't
be initialized.

Why are redirections now checked in two places? There already was a
ItemIdIsUsed() check in the "/* Perform tuple checks */" loop, but now there's
the ItemIdIsRedirected() check in the "Update chain validation." loop as well
- and the output of that is confusing, because it'll just mention the target
of the redirect.

I also think it's not quite right that some of checks inside if
(ItemIdIsRedirected()) continue in case of corruption, others don't. While
there's a later continue, that means the corrupt tuples get added to the
predecessor array. Similarly, in the non-redirect portion, the successor
array gets filled with corrupt tuples, which doesn't seem quite right to me.

A patch addressing some, but not all, of those is attached. With that I don't
see any crashes or false-positives anymore.

Greetings,

Andres Freund

Attachment Content-Type Size
verify_heapam.diff text/x-diff 3.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-03-22 22:07:57 Re: HOT chain validation in verify_heapam()
Previous Message David G. Johnston 2023-03-22 21:50:07 Re: Options to rowwise persist result of stable/immutable function with RECORD result