Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-11 21:49:01
Message-ID: CA+TgmoaQm4fXxVM5CwT1=ezVHc7t18XRR5WNZWeS9t=1igWf7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 11, 2024 at 2:30 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> Attached v7 is rebased over the commit you just made to remove
> LVPagePruneState->hastup.

Apologies for making you rebase but I was tired of thinking about that patch.

I'm still kind of hung up on the changes that 0001 makes to vacuumlazy.c.

Say we didn't add the recordfreespace parameter to lazy_scan_prune().
Couldn't the caller just compute it? lpdead_items goes out of scope,
but there's also prstate.has_lpdead_items.

Pressing that gripe a bit more, I just figured out that "Wait until
lazy_vacuum_heap_rel() to save free space" gets turned into "If we
will likely do index vacuuming, wait until lazy_vacuum_heap_rel() to
save free space." That follows the good practice of phrasing the
comment conditionally when the comment is outside the if-statement.
But the if-statement says merely "if (recordfreespace)", which is not
obviously related to "If we will likely do index vacuuming" but under
the hood actually is. But it seems like an unpleasant amount of action
at a distance. If that condition said if (vacrel->nindexes > 0 &&
vacrel->do_index_vacuuming && prstate.has_lpdead_items)
UnlockReleaseBuffer(); else { PageGetHeapFreeSpace;
UnlockReleaseBuffer; RecordPageWithFreeSpace } it would be a lot more
obvious that the code was doing what the comment says.

That's a refactoring question, but I'm also wondering if there's a
functional issue. Pre-patch, if the table has no indexes and some
items are left LP_DEAD, then we mark them unused, forget them,
RecordPageWithFreeSpace(), and if enough pages have been visited,
FreeSpaceMapVacuumRange(). Post-patch, if the table has no indexes, no
items will be left LP_DEAD, and *recordfreespace will be set to true,
so we'll PageRecordFreeSpace(). But this seems to me to mean that
post-patch we'll PageRecordFreeSpace() in more cases than we do now.
Specifically, imagine a case where the current code wouldn't mark any
items LP_DEAD and the page also had no pre-existing items that were
LP_DEAD and the table also has no indexes. Currently, I believe we
wouldn't PageRecordFreeSpace(), but with the patch, I think we would.
Am I wrong?

Note that the call to FreeSpaceMapVacuumRange() seems to try to guard
against this problem by testing for vacrel->tuples_deleted >
tuples_already_deleted. I haven't tried to verify whether that guard
is correct, but the fact that FreeSpaceMapVacuumRange() has such a
guard and RecordPageWithFreeSpace() does not have one makes me
suspicious.

Another interesting effect of the patch is that instead of getting
lazy_vacuum_heap_page()'s handling of the all-visible status of the
page, we get the somewhat more complex handling done by
lazy_scan_heap(). I haven't fully through the consequences of that,
but if you have, I'd be interested to hear your analysis.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-01-11 22:01:55 Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Previous Message Alexander Lakhin 2024-01-11 20:00:01 Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed