Re: 64-bit XIDs in deleted nbtree pages

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: 64-bit XIDs in deleted nbtree pages
Date: 2021-02-25 04:42:15
Message-ID: CAH2-WznUWHOL+nYYT2PLsn+3OWcq8OBfA1sB3FX885rE=ZQVvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 22, 2021 at 2:54 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Good point. I think that the structure should make the page deletion
> triggering condition have only secondary importance -- it is only
> described at all to be complete and exhaustive. The
> vacuum_cleanup_index_scale_factor-related threshold is all that users
> will really care about in this area.

I pushed the main 64-bit XID commit just now. Thanks!

Attached is v6, with the two remaining patches. No real changes. Just
want to keep CFBot happy.

I would like to talk about vacuum_cleanup_index_scale_factor some
more. I didn't get very far with the vacuum_cleanup_index_scale_factor
documentation (I just removed the existing references to page
deletion). When I was working on the docs I suddenly wondered: is
vacuum_cleanup_index_scale_factor actually necessary? Can we not get
rid of it completely?

The amvacuumcleanup docs seems to suggest that that would be okay:

"It is OK to return NULL if the index was not changed at all during
the VACUUM operation, but otherwise correct stats should be returned."

Currently, _bt_vacuum_needs_cleanup() gets to decide whether or not
the index will change during VACUUM (assuming no deleted pages in the
case of Postgres 11 - 13, or assuming less than ~5% on Postgres 14).
So why even bother with the heap tuple stuff at all? Why not simply
remove the triggering logic that uses btm_last_cleanup_num_heap_tuples
+ vacuum_cleanup_index_scale_factor completely? We can rely on ANALYZE
to set pg_class.reltuples/pg_class.relpages instead. IIUC this is 100%
allowed by the amvacuumcleanup contract.

I think that the original design that made VACUUM set
pg_class.reltuples/pg_class.relpages in indexes (from 15+ years ago)
assumed that it was cheap to handle statistics in passing -- the
marginal cost was approximately zero, so why not just do it? It was
not because VACUUM thinks it is valuable or urgent, and yet
vacuum_cleanup_index_scale_factor seems to assume that it must.

Of course, it may actually be hard/expensive to update the statistics
due to the vacuum_cleanup_index_scale_factor stuff that was added to
Postgres 11. The autovacuum_vacuum_insert_threshold stuff that was
added to Postgres 13 also seems quite relevant. So I think that there
is an inconsistency here.

I can see one small problem with my plan of relying on ANALYZE to do
this: VACUUM ANALYZE trusts amvacuumcleanup/btvacuumcleanup (when
called by lazyvacuum.c) to set pg_class.reltuples/pg_class.relpages
within do_analyze_rel() -- even when amvacuumcleanup/btvacuumcleanup
returns NULL:

/*
* Same for indexes. Vacuum always scans all indexes, so if we're part of
* VACUUM ANALYZE, don't overwrite the accurate count already inserted by
* VACUUM.
*/
if (!inh && !(params->options & VACOPT_VACUUM))
{
for (ind = 0; ind < nindexes; ind++)
{
AnlIndexData *thisdata = &indexdata[ind];
double totalindexrows;

totalindexrows = ceil(thisdata->tupleFract * totalrows);
vac_update_relstats(Irel[ind],
RelationGetNumberOfBlocks(Irel[ind]),
totalindexrows,
0,
false,
InvalidTransactionId,
InvalidMultiXactId,
in_outer_xact);
}
}

But this just seems like a very old bug to me. This bug can be fixed
separately by teaching VACUUM ANALYZE to recognize cases where indexes
did not have their stats updated in the way it expects.

BTW, note that btvacuumcleanup set pg_class.reltuples to 0 in all
cases following the deduplication commit until my bug fix commit
48e12913 (which was kind of a hack itself). This meant that the
statistics set by btvacuumcleanup (in the case where btbulkdelete
doesn't get called, the relevant case for
vacuum_cleanup_index_scale_factor). So it was 100% wrong for months
before anybody noticed (or at least until anybody complained).

Am I missing something here?

--
Peter Geoghegan

Attachment Content-Type Size
v6-0002-Show-pages-newly-deleted-in-VACUUM-VERBOSE-output.patch text/x-patch 8.7 KB
v6-0001-Recycle-pages-deleted-during-same-VACUUM.patch text/x-patch 19.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-25 04:52:41 Re: Single transaction in the tablesync worker?
Previous Message Dilip Kumar 2021-02-25 04:19:15 Re: Is Recovery actually paused?