Re: 64-bit XIDs in deleted nbtree pages

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: 64-bit XIDs in deleted nbtree pages
Date: 2021-02-13 05:04:45
Message-ID: CAH2-WznpLLws36u8jXOY+xe6z4aZ3YAjJb03_BfBVgckNkT7_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 12, 2021 at 8:38 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> I agree that there already are huge problems in that case. But I think
> we need to consider an append-only case as well; after bulk deletion
> on an append-only table, vacuum deletes heap tuples and index tuples,
> marking some index pages as dead and setting an XID into btpo.xact.
> Since we trigger autovacuums even by insertions based on
> autovacuum_vacuum_insert_scale_factor/threshold autovacuum will run on
> the table again. But if there is a long-running query a "wasted"
> cleanup scan could happen many times depending on the values of
> autovacuum_vacuum_insert_scale_factor/threshold and
> vacuum_cleanup_index_scale_factor. This should not happen in the old
> code. I agree this is DBA problem but it also means this could bring
> another new problem in a long-running query case.

I see your point.

This will only not be a problem with the old code because the oldest
XID in the metapage happens to restrict VACUUM in what turns out to be
exactly perfect. But why assume that? It's actually rather unlikely
that we won't be able to free even one block, even in this scenario.
The oldest XID isn't truly special -- at least not without the
restrictions that go with 32-bit XIDs.

The other thing is that vacuum_cleanup_index_scale_factor is mostly
about limiting how long we'll go before having stale statistics, and
so presumably the user gets the benefit of not having stale statistics
(maybe that theory is a bit questionable in some cases, but that
doesn't have all that much to do with page deletion -- in fact the
problem exists without page deletion ever occuring).

BTW, I am thinking about making recycling take place for pages that
were deleted during the same VACUUM. We can just use a
work_mem-limited array to remember a list of blocks that are deleted
but not yet recyclable (plus the XID found in the block). At the end
of the VACUUM, (just before calling IndexFreeSpaceMapVacuum() from
within btvacuumscan()), we can then determine which blocks are now
safe to recycle, and recycle them after all using some "late" calls to
RecordFreeIndexPage() (and without revisiting the pages a second
time). No need to wait for the next VACUUM to recycle pages this way,
at least in many common cases. The reality is that it usually doesn't
take very long for a deleted page to become recyclable -- why wait?

This idea is enabled by commit c79f6df75dd from 2018. I think it's the
next logical step.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-02-13 05:51:54 Re: Why do we have MakeSingleTupleTableSlot instead of not using MakeTupleTableSlot?
Previous Message Masahiko Sawada 2021-02-13 04:38:18 Re: 64-bit XIDs in deleted nbtree pages