Re: HOT chain validation in verify_heapam()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2022-09-09 14:53:47
Message-ID: CA+TgmoZuRyqJ3MQegHrR47Mvp5RWZ9MtJ_wZsPgaoF+qUsUkuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 9, 2022 at 10:00 AM Himanshu Upadhyaya
<upadhyaya(dot)himanshu(at)gmail(dot)com> wrote:
> Approach of introducing a successor array is good but I see one overhead with having both successor and predecessor array, that is, we will traverse each offset on page thrice(one more for original loop on offset) and with each offset we have to retrieve
> /reach an ItemID(PageGetItemId) and Item(PageGetItem) itself. This is not much overhead as they are all preprocessors but there will be some overhead.
> How about having new array(initialised with LP_NOT_CHECKED) of enum LPStatus as below
>
> typedef enum LPStatus
> {
> LP_NOT_CHECKED,
> LP_VALID,
> LP_NOT_VALID
> }LPStatus;
>
> and validating and setting with proper status at three places
> 1) while validating Redirect Tuple
> 2) while validating populating predecessor array and
> 3) at original place of "sanity check"

Well, having to duplicate the logic in three places doesn't seem all
that clean to me. Admittedly, I haven't tried implementing my
proposal, so maybe that doesn't come out very clean either. I don't
know. But I think having the code be clearly correct here is the most
important thing, not shaving a few CPU cycles here or there. It's not
even clear to me that your way would be cheaper, because an "if"
statement is certainly not free, and in fact is probably more
expensive than an extra call to PageGetItem() or PageGetItemId().
Branches are usually more expensive than math. But actually I doubt
that it matters much either way. I think the way to figure out whether
we have a performance problem is to write the code in the
stylistically best way and then test it. There may be no problem at
all, and if there is a problem, it may not be where we think it will
be.

In short, let's apply Knuth's optimization principle.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2022-09-09 14:54:07 Re: configure --with-uuid=bsd fails on NetBSD
Previous Message Aleksander Alekseev 2022-09-09 14:36:45 Re: Summary function for pg_buffercache