Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-09 19:00:27
Message-ID: CA+TgmobGpT09H6fpChckHiER_dPh1rudw_=cyriEjv2f2XJw1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 9, 2024 at 11:35 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> The easiest solution would be to change the name of the parameter to
> heap_page_prune_execute()'s from "no_indexes" to something like
> "validate_unused", since it is only used in assert builds for
> validation.

Right.

> However, though I wish a name change was the right way to solve this
> problem, my gut feeling is that it is not. It seems like we should
> rely only on the WAL record itself in recovery. Right now the
> parameter is used exclusively for validation, so it isn't so bad. But
> what if someone uses this parameter in the future in heap_xlog_prune()
> to decide how to modify the page?

Exactly.

> It seems like the right solution would be to add a flag into the prune
> record indicating what to pass to heap_page_prune_execute(). In the
> future, I'd like to add flags for updating the VM to each of the prune
> and vacuum records (eliminating the separate VM update record). Thus,
> a new flags member of the prune record could have future use. However,
> this would add a uint8 to the record. I can go and look for some
> padding if you think this is the right direction?

I thought about this approach and it might be OK but I don't love it,
because it means making the WAL record bigger on production systems
for the sake of assertion that only fires for developers. Sure, it's
possible that there might be another use in the future, but there
might also not be another use in the future.

How about changing if (no_indexes) to if (ndead == 0) and adding a
comment like this: /* If there are any tuples being marked LP_DEAD,
then the relation must have indexes, so every item being marked unused
must be a heap-only tuple. But if there are no tuples being marked
LP_DEAD, then it's possible that the relation has no indexes, in which
case all we know is that the line pointer shouldn't already be
LP_UNUSED. */

BTW:

+ * LP_REDIRECT, or LP_DEAD items to LP_UNUSED
during pruning. We
+ * can't check much here except that, if the
item is LP_NORMAL, it
+ * should have storage before it is set LP_UNUSED.

Is it really helpful to check this here, or just confusing/grasping at
straws? I mean, the requirement that LP_NORMAL items have storage is a
general one, IIUC, not something that's specific to this situation. It
feels like the equivalent of checking that your children don't set
fire to the couch on Tuesdays.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tristan Partin 2024-01-09 19:20:37 Re: Add BF member koel-like indentation checks to SanityCheck CI
Previous Message Jacob Champion 2024-01-09 18:48:55 Re: [PoC] Federated Authn/z with OAUTHBEARER