Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-08 20:50:47
Message-ID: CA+TgmoaMAxnm7kPW8auJ-T1-nZ+D8LrBxkAHZS75utvsk_ZfiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 5, 2024 at 3:34 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> Yes, attached is a patch set which does this. My previous patchset
> already reduced the number of places we unlock the buffer and update
> the freespace map in lazy_scan_heap(). This patchset combines the
> lazy_scan_prune() and lazy_scan_noprune() FSM update cases. I also
> have a version which moves the freespace map updates into
> lazy_scan_prune() and lazy_scan_noprune() -- eliminating all of these
> from lazy_scan_heap(). This is arguably more clear. But Andres
> mentioned he wanted fewer places unlocking the buffer and updating the
> FSM.

Hmm, interesting. I haven't had time to study this fully today, but I
think 0001 looks fine and could just be committed. Hooray for killing
useless variables with dumb names.

This part of 0002 makes me very, very uncomfortable:

+ /*
+ * Update all line pointers per the record, and repair
fragmentation.
+ * We always pass no_indexes as true, because we don't
know whether or
+ * not this option was used when pruning. This reduces
the validation
+ * done on replay in an assert build.
+ */
+ heap_page_prune_execute(buffer, true,

redirected, nredirected,
nowdead, ndead,

nowunused, nunused);

The problem that I have with this is that we're essentially saying
that it's ok to lie to heap_page_prune_execute because we know how
it's going to use the information, and therefore we know that the lie
is harmless. But that's not how things are supposed to work. We should
either find a way to tell the truth, or change the name of the
parameter so that it's not a lie, or change the function so that it
doesn't need this parameter in the first place, or something. I can
occasionally stomach this sort of lying as a last resort when there's
no realistic way of adapting the code being called, but that's surely
not the case here -- this is a newborn parameter, and it shouldn't be
a lie on day 1. Just imagine if some future developer thought that the
no_indexes parameter meant that the relation actually didn't have
indexes (the nerve of them!).

I took a VERY fast look through the rest of the patch set and I think
that the overall direction looks like it's probably reasonable, but
that's a very preliminary conclusion which I reserve the right to
revise after studying further. @Andres: Are you planning to
review/commit any of this? Are you hoping that I'm going to do that?
Somebody else want to jump in here?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-01-08 21:08:47 Re: Adding a pg_get_owned_sequence function?
Previous Message Tom Lane 2024-01-08 20:48:32 Make psql ignore trailing semicolons in \sf, \ef, etc