Re: 64-bit XIDs in deleted nbtree pages

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: 64-bit XIDs in deleted nbtree pages
Date: 2021-03-01 04:07:45
Message-ID: CAD21AoCm8Y8Nd4uoUG9WRbPb3g0vV+UTYg0dJGFk1Y03CM_FNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 26, 2021 at 9:58 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > If we don't want btvacuumcleanup() to collect index statistics, we can
> > remove vacuum_cleanup_index_scale_factor (at least from btree
> > perspectives), as you mentioned. One thing that may be worth
> > mentioning is that the difference between the index statistics taken
> > by ANALYZE and btvacuumcleanup() is that the former statistics is
> > always an estimation. That’s calculated by compute_index_stats()
> > whereas the latter uses the result of an index scan. If
> > btvacuumcleanup() doesn’t scan the index and always returns NULL, it
> > would become hard to get accurate index statistics, for example in a
> > static table case. I've not checked which cases index statistics
> > calculated by compute_index_stats() are inaccurate, though.
>
> The historic context makes it easier to understand what to do here --
> it makes it clear that amvacuumcleanup() routine does not (or should
> not) do any index scan when the index hasn't (and won't) be modified
> by the current VACUUM operation. The relevant sgml doc sentence I
> quoted to you recently ("It is OK to return NULL if the index was not
> changed at all during the VACUUM operation...") was added by commit
> e57345975cf in 2006. Much of the relevant 2006 discussion is here,
> FWIW:
>
> https://www.postgresql.org/message-id/flat/26433.1146598265%40sss.pgh.pa.us#862ee11c24da63d0282e0025abbad19c
>
> So now we have the formal rules for index AMs, as well as background
> information about what various hackers (mostly Tom) were considering
> when the rules were written.
>
> > According to the doc, if amvacuumcleanup/btvacuumcleanup returns NULL,
> > it means the index is not changed at all. So do_analyze_rel() executed
> > by VACUUM ANALYZE also doesn't need to update the index statistics
> > even when amvacuumcleanup/btvacuumcleanup returns NULL. No?
>
> Consider hashvacuumcleanup() -- here it is in full (it hasn't really
> changed since 2006, when it was updated by that same commit I cited):
>
> /*
> * Post-VACUUM cleanup.
> *
> * Result: a palloc'd struct containing statistical info for VACUUM displays.
> */
> IndexBulkDeleteResult *
> hashvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
> {
> Relation rel = info->index;
> BlockNumber num_pages;
>
> /* If hashbulkdelete wasn't called, return NULL signifying no change */
> /* Note: this covers the analyze_only case too */
> if (stats == NULL)
> return NULL;
>
> /* update statistics */
> num_pages = RelationGetNumberOfBlocks(rel);
> stats->num_pages = num_pages;
>
> return stats;
> }
>
> Clearly hashvacuumcleanup() was considered by Tom when he revised the
> documentation in 2006. Here are some observations about
> hashvacuumcleanup() that seem relevant now:
>
> * There is no "analyze_only" handling, just like nbtree.
>
> "analyze_only" is only used by GIN, even now, 15+ years after it was
> added. GIN uses it to make autovacuum workers (never VACUUM outside of
> an AV worker) do pending list insertions for ANALYZE -- just to make
> it happen more often. This is a niche thing -- clearly we don't have
> to care about it in nbtree, even if we make btvacuumcleanup() (almost)
> always return NULL when there was no btbulkdelete() call.
>
> * num_pages (which will become pg_class.relpages for the index) is not
> set when we return NULL -- hashvacuumcleanup() assumes that ANALYZE
> will get to it eventually in the case where VACUUM does no real work
> (when it just returns NULL).
>
> * We also use RelationGetNumberOfBlocks() to set pg_class.relpages for
> index relations during ANALYZE -- it's called when we call
> vac_update_relstats() (I quoted this do_analyze_rel() code to you
> directly in a recent email).
>
> * In general, pg_class.relpages isn't an estimate (because we use
> RelationGetNumberOfBlocks(), both in the VACUUM-updates case and the
> ANALYZE-updates case) -- only pg_class.reltuples is truly an estimate
> during ANALYZE, and so getting a "true count" seems to have only
> limited practical importance.
>
> I think that this sets a precedent in support of my view that we can
> simply get rid of vacuum_cleanup_index_scale_factor without any
> special effort to maintain pg_class.reltuples. As I said before, we
> can safely make btvacuumcleanup() just like hashvacuumcleanup(),
> except when there are known deleted-but-not-recycled pages, where a
> full index scan really is necessary for reasons that are not related
> to statistics at all (of course we still need the *logic* that was
> added to nbtree by the vacuum_cleanup_index_scale_factor commit --
> that is clearly necessary). My guess is that Tom would have made
> btvacuumcleanup() look identical to hashvacuumcleanup() in 2006 if
> nbtree didn't have page deletion to consider -- but that had to be
> considered.

Makes sense. If getting a true pg_class.reltuples is not important in
practice, it seems not to need btvacuumcleanup() do an index scan for
getting statistics purpose.

>
> My reasoning here is also based on the tendency of the core code to
> mostly think of hash indexes as very similar to nbtree indexes.
>
> Even though "the letter of the law" favors removing the
> vacuum_cleanup_index_scale_factor GUC + param in the way I have
> outlined, that is not the only thing that matters -- we must also
> consider "the spirit of the law". Realistically, hash indexes are far
> less popular than nbtree indexes, and so even if I am 100% correct in
> theory, the real world might not be so convinced by my legalistic
> argument. We've already seen the issue with VACUUM ANALYZE (which has
> not been truly consistent with the behavior hashvacuumcleanup() for
> many years). There might be more.
>
> I suppose I could ask Tom what he thinks?

+1

> The hardest question is what
> to do in the backbranches...I really don't have a strong opinion right
> now.

Since it seems not a bug I personally think we don't need to do
anything for back branches. But if we want not to trigger an index
scan by vacuum_cleanup_index_scale_factor, we could change the default
value to a high value (say, to 10000) so that it can skip an index
scan in most cases.

>
> > > BTW, note that btvacuumcleanup set pg_class.reltuples to 0 in all
> > > cases following the deduplication commit until my bug fix commit
> > > 48e12913 (which was kind of a hack itself). This meant that the
> > > statistics set by btvacuumcleanup (in the case where btbulkdelete
> > > doesn't get called, the relevant case for
> > > vacuum_cleanup_index_scale_factor). So it was 100% wrong for months
> > > before anybody noticed (or at least until anybody complained).
> > >
> >
> > Maybe we need more regression tests here.
>
> I agree, but my point was that even a 100% broken approach to stats
> within btvacuumcleanup() is not that noticeable. This supports the
> idea that it just doesn't matter very much if a cleanup-only scan of
> the index never takes place (or only takes place when we need to
> recycle deleted pages, which is generally rare but will become very
> rare once I commit the attached patch).
>
> Also, my fix for this bug (commit 48e12913) was actually pretty bad;
> there are now cases where the btvacuumcleanup()-only VACUUM case will
> set pg_class.reltuples to a value that is significantly below what it
> should be (it all depends on how effective deduplication is with the
> data). I probably should have made btvacuumcleanup()-only VACUUMs set
> "stats->estimate_count = true", purely to make sure that the core code
> doesn't trust the statistics too much (it's okay for VACUUM VERBOSE
> output only). Right now we can get a pg_class.reltuples that is
> "exactly wrong" -- it would actually be a big improvement if it was
> "approximately correct".

Understood. Thank you for your explanation.

>
> Another new concern for me (another concern unique to Postgres 13) is
> autovacuum_vacuum_insert_scale_factor-driven autovacuums.

IIUC the purpose of autovacuum_vacuum_insert_scale_factor is
visibility map maintenance. And as per this discussion, it seems not
necessary to do an index scan in btvacuumcleanup() triggered by
autovacuum_vacuum_insert_scale_factor.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-03-01 04:11:10 Re: doc review for v14
Previous Message Amit Kapila 2021-03-01 03:55:04 Re: Update docs of logical replication for commit ce0fdbfe97.