Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Date: 2021-12-07 20:27:36
Message-ID: CA+TgmoZnKXf4YPdtzWam+WoF7B-jjsDCZMG5x+j+wnCci6-oXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 7, 2021 at 2:13 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> For example, why should we count dead heap-only tuples from earlier in
> a HOT chain, even when we see no evidence that opportunistic HOT
> pruning can't keep up on that page? Since we actually care about the
> direction of things, not just the present state of things, we'd be
> justified in completely ignoring those dead tuples. Similarly, it
> might well make sense to give more weight to concentrations of LP_DEAD
> items on a page -- that is a signal that things are not going well *at
> the level of the page*. Not so much when you have a few LP_DEAD stubs,
> but certainly when you have dozens of them on one page, or even
> hundreds. And so ISTM that the conditions of the page should influence
> how we interpret/count that page's dead tuples, in both directions
> (interpret the page as having more dead tuples, or fewer).

Well... I mean, I think we're almost saying the same thing, then, but
I think you're saying it more confusingly. I have no objection to
counting the number of dead HOT chains rather than the number of dead
tules, because that's what affects the index contents, but there's no
need to characterize that as "not the literal truth." There's nothing
fuzzy or untrue about it if we simply say that's what we're doing.

> Right. And as I keep saying, the truly important thing is to not
> *completely* ignore any relevant dimension of cost. I just don't want
> to ever be wildly wrong -- not even once. We can tolerate being
> somewhat less accurate all the time (not that we necessarily have to
> make a trade-off), but we cannot tolerate pathological behavior. Of
> course I include new/theoretical pathological behaviors here (not just
> the ones we know about today).

Sure, but we don't *need* to be less accurate, and I don't think we
even *benefit* from being less accurate. If we do something like count
dead HOT chains instead of dead tuples, let's not call that a
less-accurate count of dead tuples. Let's call it an accurate count of
dead HOT chains.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-12-07 20:40:29 Re: enable certain TAP tests for MSVC builds
Previous Message Tom Lane 2021-12-07 20:24:46 Re: Dubious usage of TYPCATEGORY_STRING