Re: Why does pg_class.reltuples count only live tuples in indexes (after VACUUM runs)?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Why does pg_class.reltuples count only live tuples in indexes (after VACUUM runs)?
Date: 2022-04-18 19:54:36
Message-ID: CAH2-WzmfddQ18Su7Zp6nnBnb3CMRT8Stv8Zg=ic1OmZcRt7Q=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 18, 2022 at 12:41 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> If the planner looks at index reltuples at all, it's doing so
> for cost estimation purposes, where the count including dead
> entries is probably the right thing to use.

Then why does heapam_index_build_range_scan do it the other way around?

I think that it probably doesn't matter that much in practice. The
inconsistency should be noted in update_relstats_all_indexes, though.

> If you want to make this cleaner, maybe there's a case for
> splitting reltuples into two columns. But then index AMs
> would be on the hook to determine how many of their entries
> are live, which is not really an index's concern.

The main concern behind this is that we're using
vacrel->new_rel_tuples for the IndexVacuumInfo.num_heap_tuples value
in amvacuumcleanup (but not in ambulkdelete), which is calculated
towards the end of lazy_scan_heap, like so:

/*
* Also compute the total number of surviving heap entries. In the
* (unlikely) scenario that new_live_tuples is -1, take it as zero.
*/
vacrel->new_rel_tuples =
Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
vacrel->missed_dead_tuples;

I think that this doesn't really belong here; new_rel_tuples should
only be used for VACUUM VERBOSE/server log output, once we return to
heap_vacuum_rel from lazy_scan_heap. We should use
vacrel->new_live_tuples as our IndexVacuumInfo.num_heap_tuples value
in the amvacuumcleanup path (instead of new_rel_tuples). That way the
rule about IndexVacuumInfo.num_heap_tuples is simple: it's always
taken from pg_class.reltuples (for the heap rel). Either the existing
value, or the new value.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2022-04-18 19:59:44 Dump/Restore of non-default PKs
Previous Message Mark Dilger 2022-04-18 19:46:09 Re: Postgres perl module namespace