Re: pg_amcheck contrib application

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-03-29 20:06:45
Message-ID: CA+TgmoZ3dkrFvtOraGgbH_uVQVjoOWQpmFDzq_EmcgmYwSfoPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 29, 2021 at 1:45 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> Thanks! The attached patch addresses your comments here and in your prior email. In particular, this patch changes the tuple visibility logic to not check tuples for which the inserting transaction aborted or is still in progress, and to not check toast for tuples deleted in transactions older than our transaction snapshot's xmin. A list of toasted attributes which are safe to check is compiled per main table page during the scan of the page, then checked after the buffer lock on the main page is released.
>
> In the perhaps unusual case where verify_heapam() is called in a transaction which has also added tuples to the table being checked, this patch's visibility logic chooses not to check such tuples. I'm on the fence about this choice, and am mostly following your lead. I like that this decision maintains the invariant that we never check tuples which have not yet been committed.
>
> The patch includes a bit of refactoring. In the old code, heap_check() performed clog bounds checking on xmin and xmax prior to calling check_tuple_header_and_visibilty(), but I think that's not such a great choice. If the tuple header is garbled to have random bytes in the xmin and xmax fields, and we can detect that situation because other tuple header fields are garbled in detectable ways, I'd rather get a report about the header being garbled than a report about the xmin or xmax being out of bounds. In the new code, the tuple header is checked first, then the visibility is checked, then the tuple is checked against the current relation description, then the tuple attributes are checked. I think the layout is easier to follow, too.

Hmm, so this got ~10x bigger from my version. Could you perhaps
separate it out into a series of patches for easier review? Say, one
that just fixes the visibility logic, and then a second to avoid doing
the TOAST check with a buffer lock held, and then more than that if
there are other pieces that make sense to separate out?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-03-29 20:16:42 Re: UniqueKey on Partitioned table.
Previous Message Isaac Morland 2021-03-29 19:32:56 Add missing function abs (interval)