Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-18 16:45:53
Message-ID: CA+TgmoZXEVxEbZjQfXyjo1VQ2osmQYKaXh+=wOqwu0k8wWSM2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 18, 2024 at 11:17 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> True. But the way that PageGetHeapFreeSpace() returns 0 for a page
> with 291 LP_DEAD stubs is a much older behavior. When that happens it
> is literally true that the page has lots of free space. And yet it's
> not free space we can actually use. Not until those LP_DEAD items are
> marked LP_UNUSED.

To me, this is just accurate reporting. What we care about in this
context is the amount of free space on the page that can be used to
store a new tuple. When there are no line pointers available to be
allocated, that amount is 0.

> Another big source of inaccuracies here is that we don't credit
> RECENTLY_DEAD tuple space with being free space. Maybe that isn't a
> huge problem, but it makes it even harder to believe that precision in
> FSM accounting is an intrinsic good.

The difficulty here is that we don't know how long it will be before
that space can be reused. Those recently dead tuples could become dead
within a few milliseconds or stick around for hours. I've wondered
about the merits of some FSM that had built-in visibility awareness,
i.e. the capability to record something like "page X currently has Y
space free and after XID Z is all-visible it will have Y' space free".
That seems complex, but without it, we either have to bet that the
space will actually become free before anyone tries to use it, or that
it won't. If whatever guess we make is wrong, bad things happen.

> My remarks about "FSM_CATEGORIES-wise precision" were basically
> remarks about the fundamental problem with the free space map. Which
> is really that it's just a map of free space, that gives exactly zero
> thought to various high level things that *obviously* matter. I wasn't
> particularly planning on getting into the specifics of that with you
> now, on this thread.

Fair.

> A brief recap might be useful: other systems with a heap table AM free
> space management structure typically represent the free space
> available on each page using a far more coarse grained counter.
> Usually one with less than 10 distinct increments. The immediate
> problem with FSM_CATEGORIES having such a fine granularity is that it
> increases contention/competition among backends that need to find some
> free space for a new tuple. They'll all diligently try to find the
> page with the least free space that still satisfies their immediate
> needs -- there is no thought for the second-order effects, which are
> really important in practice.

I think that the completely deterministic nature of the computation is
a mistake regardless of anything else. That serves to focus contention
rather than spreading it out, which is dumb, and would still be dumb
with any other number of FSM_CATEGORIES.

> What I really wanted to convey is this: if you're going to go the
> route of ignoring LP_DEAD free space during vacuuming, you're
> conceding that having a high degree of precision about available free
> space isn't actually useful (or wouldn't be useful if it was actually
> possible at all). Which is something that I generally agree with. I'd
> just like it to be clear that you/Melanie are in fact taking one small
> step in that direction. We don't need to discuss possible later steps
> beyond that first step. Not right now.

Yeah. I'm not sure we're actually going to change that right now, but
I agree with the high-level point regardless, which I would summarize
like this: The current system provides more precision about available
free space than we actually need, while failing to provide some other
things that we really do need. We need not agree today on exactly what
those other things are or how best to get them in order to agree that
the current system has significant flaws, and we do agree that it
does.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-01-18 17:11:18 Re: remaining sql/json patches
Previous Message Matthias van de Meent 2024-01-18 16:38:58 Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan