Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Date: 2021-12-06 20:07:44
Message-ID: CA+TgmobFv_X485LjFxz4OtkvKHXBAngGVvkGDtPnFt1tbnF2tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 5, 2021 at 12:28 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I wonder why we're counting the number of dead tuples (or LP_DEAD stub
> items) in the relation as a whole in ANALYZE's acquire_sample_rows()
> function. Wouldn't it make more sense to focus on the "live vs dead
> tuple properties" of heap pages that are not known to be all-visible
> when we generate statistics for our pgstat_report_analyze() report?
> These statistic collector stats are only for the benefit of autovacuum
> scheduling -- and so they're *consumed* in a way that is totally
> different to the nearby pg_statistic stats.

I think this could be the right idea. I'm not certain that it is, but
it does sound believable.

> This new approach also buys us the ability to extrapolate a new
> estimated number of dead tuples using old, stale stats. The stats can
> be combined with the authoritative/known number of not-all-visible
> pages right this second, since it's cheap enough to *accurately*
> determine the total number of not-all-visible pages for a heap
> relation by calling visibilitymap_count(). My guess is that this would
> be much more accurate in practice: provided the original average
> number of dead/live tuples (tuples per not-all-visible block) was
> still reasonably accurate, the extrapolated "total dead tuples right
> now" values would also be accurate.

So does this. If some of the table is now all-visible when it wasn't
before, it's certainly a good guess that the portions that still
aren't have about the same distribution of dead tuples that they did
before ... although the other direction is less clear: it seems
possible that newly not-all-visible pages have fewer dead tuples than
ones which have been not-all-visible for a while. But you have to make
some guess.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-12-06 20:30:28 Re: ExecTypeSetColNames is fundamentally broken
Previous Message Bossart, Nathan 2021-12-06 19:32:40 Re: Do we need pre-allocate WAL files during end-of-recovery checkpoint?