From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Nitin Motiani <nitinmotiani(at)google(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pgstattuple "unexpected zero page" for gist and hash indexes |
Date: | 2025-09-30 12:01:46 |
Message-ID: | CAFiTN-vDwaX4XA64tNfQkBdsrPdxTR2=Tr10Z27eK4c1Ps5kWg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 30, 2025 at 5:10 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Sep 30, 2025 at 12:58 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On Mon, Sep 29, 2025 at 09:03:13AM +0530, Dilip Kumar wrote:
> > > On Tue, Sep 16, 2025 at 1:06 PM Nitin Motiani <nitinmotiani(at)google(dot)com> wrote:
> > >> For hash indexes, this error comes from pgstat_hash_page() ->
> > >> _hash_getbuf_with_strategy() -> _hash_checkpage() for a zero page.
> > >> Similarly gistcheckpage() is the cause for gist indexes.
> > >>
> > >> Since after a crash recovery a zero page is normal (as long as not
> > >> referenced from the rest of the index), emitting
> > >> ERRCODE_INDEX_CORRUPTED is a false alarm.
> > >
> > > I agree this seems like a false alarm, and at least its inconsistent
> > > with the btree.
> >
> > It also means that if one is doing a join of pg_class with pgstattuple
> > to gather some diagnostics on a given database, then one is required
> > to wrap pgstattuple into a custom plpgsql function to catch errors,
> > like you did, or avoid these indexes in all cases. This is not
> > user-friendly in both cases.
>
> Right, I agree with that.
>
> > On Tue, Sep 30, 2025 at 10:46:11AM +0530, Nitin Motiani wrote:
> > > On Mon, Sep 29, 2025 at 9:03 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >> I observed that pgstat_btree_page() incorporates the count of new
> > >> pages into its free space calculation[1]. Does it make sense to do the
> > >> same for hash and gist as well as we are leaning towards making these
> > >> consistent.
> > >>
> > >> [1]
> > >> if (PageIsNew(page))
> > >> {
> > >> /* fully empty page */
> > >> stat->free_space += BLCKSZ;
> > >> }
Looking into the V2 below check [1]
My primary concern is the handling of non-new pages where the page's
special size does not match GISTPageOpaqueData (same for
pgstat_hash_page()). I mean isn't that corruption and this should be
reported. OTOH, pgstat_btree_page() is also not reporting it, in fact
it is assuming that if the page is not new then it can safely read the
page opaque data without validating the size. My additional goal with
this patch is to standardize the validation logic across all three
functions to prevent these inconsistencies in the future.
[1]
if (PageIsNew(page))
{
/* fully empty page */
stat->free_space += BLCKSZ;
}
else if (PageGetSpecialSize(page) == MAXALIGN(sizeof(GISTPageOpaqueData)) &&
GistPageIsLeaf(page))
{
pgstat_index_page(stat, page, FirstOffsetNumber,
PageGetMaxOffsetNumber(page));
}
else
{
/* root or node or corrupted */
}
--
Regards,
Dilip Kumar
Google
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2025-09-30 12:04:13 | Re: JIT works only partially with meson build? |
Previous Message | Kevin K Biju | 2025-09-30 11:51:05 | Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath |