Re: 64-bit XIDs in deleted nbtree pages

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: 64-bit XIDs in deleted nbtree pages
Date: 2021-02-16 06:52:07
Message-ID: CAH2-WznpoSLp2aJi54tYP_GzYGsLboTiHLOoUJUGhvHOC1DOtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 15, 2021 at 7:26 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Actually, there is one more reason why I bring up idea 1 now: I want
> to hear your thoughts on the index AM API questions now, which idea 1
> touches on. Ideally all of the details around the index AM VACUUM APIs
> (i.e. when and where the extra work happens -- btvacuumcleanup() vs
> btbulkdelete()) won't need to change much in the future. I worry about
> getting this index AM API stuff right, at least a little.

Speaking of problems like this, I think I spotted an old one: we call
_bt_update_meta_cleanup_info() in either btbulkdelete() or
btvacuumcleanup(). I think that we should always call it in
btvacuumcleanup(), though -- even in cases where there is no call to
btvacuumscan() inside btvacuumcleanup() (because btvacuumscan()
happened earlier instead, during the btbulkdelete() call).

This makes the value of IndexVacuumInfo.num_heap_tuples (which is what
we store in the metapage) much more accurate -- right now it's always
pg_class.reltuples from *before* the VACUUM started. And so the
btm_last_cleanup_num_heap_tuples value in a nbtree metapage is often
kind of inaccurate.

This "estimate during ambulkdelete" issue is documented here (kind of):

/*
* Struct for input arguments passed to ambulkdelete and amvacuumcleanup
*
* num_heap_tuples is accurate only when estimated_count is false;
* otherwise it's just an estimate (currently, the estimate is the
* prior value of the relation's pg_class.reltuples field, so it could
* even be -1). It will always just be an estimate during ambulkdelete.
*/
typedef struct IndexVacuumInfo
{
...
}

The name of the metapage field is already
btm_last_cleanup_num_heap_tuples, which already suggests the approach
that I propose now. So why don't we do it like that already?

(Thinks some more...)

I wonder: did this detail change at the last minute during the
development of the feature (just before commit 857f9c36) back in early
2018? That change have made it easier to deal with
oldestBtpoXact/btm_oldest_btpo_xact, which IIRC was a late addition to
the patch -- so maybe it's truly an accident that the code doesn't
work the way that I suggest it should already. (It's annoying to make
state from btbulkdelete() appear in btvacuumcleanup(), unless it's
from IndexVacuumInfo or something -- I can imagine this changing at
the last minute, just for that reason.)

Do you think that this needs to be treated as a bug in the
backbranches, Masahiko? I'm not sure...

In any case we should probably make this change as part of Postgres
14. Don't you think? It's certainly easy to do it this way now, since
there will be no need to keep around a oldestBtpoXact value until
btvacuumcleanup() (in the common case where btbulkdelete() is where we
call btvacuumscan()). The new btm_last_cleanup_num_delpages field
(which replaces btm_oldest_btpo_xact) has a value that just comes from
the bulk stats, which is easy anyway.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takashi Menjo 2021-02-16 07:20:10 Re: [PoC] Non-volatile WAL buffer
Previous Message Michael Paquier 2021-02-16 06:50:31 Re: ERROR: invalid spinlock number: 0