Re: 64-bit XIDs in deleted nbtree pages

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: 64-bit XIDs in deleted nbtree pages
Date: 2021-02-25 13:42:10
Message-ID: CAD21AoDX7gUaaxA50PK-0fhSK9zhVMZMMVD--1d5XO3rTj0g+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 25, 2021 at 1:42 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Mon, Feb 22, 2021 at 2:54 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Good point. I think that the structure should make the page deletion
> > triggering condition have only secondary importance -- it is only
> > described at all to be complete and exhaustive. The
> > vacuum_cleanup_index_scale_factor-related threshold is all that users
> > will really care about in this area.
>
> I pushed the main 64-bit XID commit just now. Thanks!

Awesome!

>
> Attached is v6, with the two remaining patches. No real changes. Just
> want to keep CFBot happy.

Thank you for updating the patch. I'll have a look at them.

>
> I would like to talk about vacuum_cleanup_index_scale_factor some
> more. I didn't get very far with the vacuum_cleanup_index_scale_factor
> documentation (I just removed the existing references to page
> deletion). When I was working on the docs I suddenly wondered: is
> vacuum_cleanup_index_scale_factor actually necessary? Can we not get
> rid of it completely?
>
> The amvacuumcleanup docs seems to suggest that that would be okay:
>
> "It is OK to return NULL if the index was not changed at all during
> the VACUUM operation, but otherwise correct stats should be returned."
>
> Currently, _bt_vacuum_needs_cleanup() gets to decide whether or not
> the index will change during VACUUM (assuming no deleted pages in the
> case of Postgres 11 - 13, or assuming less than ~5% on Postgres 14).
> So why even bother with the heap tuple stuff at all? Why not simply
> remove the triggering logic that uses btm_last_cleanup_num_heap_tuples
> + vacuum_cleanup_index_scale_factor completely? We can rely on ANALYZE
> to set pg_class.reltuples/pg_class.relpages instead. IIUC this is 100%
> allowed by the amvacuumcleanup contract.
>
> I think that the original design that made VACUUM set
> pg_class.reltuples/pg_class.relpages in indexes (from 15+ years ago)
> assumed that it was cheap to handle statistics in passing -- the
> marginal cost was approximately zero, so why not just do it? It was
> not because VACUUM thinks it is valuable or urgent, and yet
> vacuum_cleanup_index_scale_factor seems to assume that it must.
>
> Of course, it may actually be hard/expensive to update the statistics
> due to the vacuum_cleanup_index_scale_factor stuff that was added to
> Postgres 11. The autovacuum_vacuum_insert_threshold stuff that was
> added to Postgres 13 also seems quite relevant. So I think that there
> is an inconsistency here.

btvacuumcleanup() has been playing two roles: recycling deleted pages
and collecting index statistics. Before introducing
vacuum_cleanup_index_scale_factor, btvacuumcleanup() always scanned
the index for both purpose. So it was a problem that we do an index
scan when anti-wraparound vacuum even if the table has not been
changed at all. The motivation of vacuum_cleanup_index_scale_factor is
to decrease the frequency of collecting index statistics (but not to
eliminate it). Since deleted pages could be left by btvacuumcleanup()
skipping an index scan, we introduced btm_oldest_btpo_xact (and it
became unnecessary by commit e5d8a99903).

If we don't want btvacuumcleanup() to collect index statistics, we can
remove vacuum_cleanup_index_scale_factor (at least from btree
perspectives), as you mentioned. One thing that may be worth
mentioning is that the difference between the index statistics taken
by ANALYZE and btvacuumcleanup() is that the former statistics is
always an estimation. That’s calculated by compute_index_stats()
whereas the latter uses the result of an index scan. If
btvacuumcleanup() doesn’t scan the index and always returns NULL, it
would become hard to get accurate index statistics, for example in a
static table case. I've not checked which cases index statistics
calculated by compute_index_stats() are inaccurate, though.

>
> I can see one small problem with my plan of relying on ANALYZE to do
> this: VACUUM ANALYZE trusts amvacuumcleanup/btvacuumcleanup (when
> called by lazyvacuum.c) to set pg_class.reltuples/pg_class.relpages
> within do_analyze_rel() -- even when amvacuumcleanup/btvacuumcleanup
> returns NULL:
>
> /*
> * Same for indexes. Vacuum always scans all indexes, so if we're part of
> * VACUUM ANALYZE, don't overwrite the accurate count already inserted by
> * VACUUM.
> */
> if (!inh && !(params->options & VACOPT_VACUUM))
> {
> for (ind = 0; ind < nindexes; ind++)
> {
> AnlIndexData *thisdata = &indexdata[ind];
> double totalindexrows;
>
> totalindexrows = ceil(thisdata->tupleFract * totalrows);
> vac_update_relstats(Irel[ind],
> RelationGetNumberOfBlocks(Irel[ind]),
> totalindexrows,
> 0,
> false,
> InvalidTransactionId,
> InvalidMultiXactId,
> in_outer_xact);
> }
> }
>
> But this just seems like a very old bug to me. This bug can be fixed
> separately by teaching VACUUM ANALYZE to recognize cases where indexes
> did not have their stats updated in the way it expects.

According to the doc, if amvacuumcleanup/btvacuumcleanup returns NULL,
it means the index is not changed at all. So do_analyze_rel() executed
by VACUUM ANALYZE also doesn't need to update the index statistics
even when amvacuumcleanup/btvacuumcleanup returns NULL. No?

>
> BTW, note that btvacuumcleanup set pg_class.reltuples to 0 in all
> cases following the deduplication commit until my bug fix commit
> 48e12913 (which was kind of a hack itself). This meant that the
> statistics set by btvacuumcleanup (in the case where btbulkdelete
> doesn't get called, the relevant case for
> vacuum_cleanup_index_scale_factor). So it was 100% wrong for months
> before anybody noticed (or at least until anybody complained).
>

Maybe we need more regression tests here.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-02-25 13:58:00 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Amit Kapila 2021-02-25 13:31:33 Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.