Re: [PATCH] Check dead heap items before marking them unused

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: wufengwufengwufeng(at)gmail(dot)com
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Check dead heap items before marking them unused
Date: 2026-06-25 16:29:29
Message-ID: CALj2ACU6OpA3qJWt6ctByi4bnrKX=qC8Kk4FE_2eNqKz1_6Wkw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Jun 25, 2026 at 1:32 AM <wufengwufengwufeng(at)gmail(dot)com> wrote:
>
> While reviewing lazy_vacuum_heap_page(), I noticed that the code relies on
> an Assert() before marking items from the dead-items store LP_UNUSED.
>
> That Assert checks that each item is LP_DEAD and has no storage. In
> production builds, however, the check is compiled out. If the invariant is
> ever broken by a future code change, stale dead-items tracking, or page
> corruption, lazy_vacuum_heap_page() can mark an unexpected item unused
> without detecting the problem.

All the dead line-pointers are collected during the heap page scan
(vacrel->dead_items) and their offsets are passed to
lazy_vacuum_heap_page(), which has this invariant as an assert. I
understand that page corruption could theoretically break this
invariant, but that's a rare scenario and without a real use-case it's
hard to argue in favor of adding a runtime check.

> I verified this locally by manually constructing a page with an LP_DEAD item
> that still had storage. On an assert build, VACUUM trips the existing
> Assert. On a production build without the patch, VACUUM succeeds and marks
> the item LP_UNUSED. With the patch, VACUUM reports ERROR before modifying
> the page.
>
> I am not aware of a normal SQL-level path to create this state; the patch is
> intended as a defense-in-depth check around a data-integrity invariant.

I'm having a hard time understanding how this can actually happen in a
production system. For theory's sake: say the heap page scan
identifies a dead item and adds it to the TID list. Before vacuum
reaches lazy_vacuum_heap_page(), this window could be long for tables
of hundreds of GBs or even TBs with many dead TIDs. Now, for the
invariant to break, vacuum's dead TID store would say the line-pointer
is dead, but what's read back from storage doesn't agree - and that is
exactly corruption. But for that page to be read back from storage in
the first place, it has to have been evicted and re-read, and if page
checksums are enabled, wouldn't that corruption already be caught at
read time before we ever reach lazy_vacuum_heap_page()?

> @@ -2777,6 +2777,22 @@ lazy_vacuum_heap_page(LVRelState *vacrel,
> BlockNumber blkno, Buffer buffer,
> VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
> InvalidOffsetNumber);
>
> + /*
> + * Validate all dead items before doing visibility checks or entering
> + * the critical section. The pruning phase should have left every
> + * item in the dead-items store as LP_DEAD with no storage. If this
> + * invariant is broken, by a bug in pruning, tracking, or page
> + * corruption, proceeding would silently free tuple storage.
> + */
> + for (int i = 0; i < num_offsets; i++)
> + {
> + ItemId itemid = PageGetItemId(page, deadoffsets[i]);
> +
> + if (!ItemIdIsDead(itemid) || ItemIdHasStorage(itemid))
> + elog(ERROR, "unexpected non-dead or non-empty item at offset %u in
> block %u of relation \"%s\"",
> + deadoffsets[i], blkno, RelationGetRelationName(vacrel->rel));
> + }
> +

This loop isn't cheap. In the worst case, all line-pointers on a page
are marked dead (a bulk deletion for example), and this loop plus the
same loop below in the critical section would run roughly 2*2K times
the number of heap pages per table.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2026-06-25 16:51:25 Re: Handle concurrent drop when doing whole database vacuum
Previous Message Sami Imseih 2026-06-25 15:48:07 Re: doc: fix pg_stat_autovacuum_scores threshold wording