Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-06 15:34:59
Message-ID: CAAKRu_YknGDZvH2Wxke3rDhXwgv27dZdgNdxQZdh60=zYmukfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Patch 0001 in the attached set addresses the following review feedback:

- pronto_reap renamed to no_indexes
- reduce the number of callers of heap_prune_record_unused() by calling
it from heap_prune_record_dead() when appropriate
- add unlikely hint to no_indexes test

I've also dropped the patch which moves the test of hastup into
lazy_scan_[no]prune(). In the future, I plan to remove the loop from
lazy_scan_prune() which sets hastup and set it instead in
heap_page_prune(). Changes to hastup's usage could be done with that
change instead of in this set.

The one review point I did not address in the code is that which I've
responded to inline below.

Though not required for the immediate reaping feature, I've included
patches 0002-0004 which are purely refactoring. These patches simplify
lazy_scan_heap() by moving the visibility map code into
lazy_scan_prune() and consolidating the updates to the FSM and
vacrel->nonempty_pages. I've proposed them in this thread because there
is an interdependence between eliminating the lazy_vacuum_heap_page()
call, moving the VM code, and combining the three FSM updates
(lazy_scan_prune() [index and no index] and lazy_scan_noprune()).

This illustrates the code clarity benefits of the change to mark line
pointers LP_UNUSED during pruning if the table has no indexes. I can
propose them in another thread if 0001 is merged.

On Thu, Jan 4, 2024 at 3:03 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:
> > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> > index 14de8158d49..b578c32eeb6 100644
> > --- a/src/backend/access/heap/heapam.c
> > +++ b/src/backend/access/heap/heapam.c
> > @@ -8803,8 +8803,13 @@ heap_xlog_prune(XLogReaderState *record)
> > nunused = (end - nowunused);
> > Assert(nunused >= 0);
> >
> > - /* Update all line pointers per the record, and repair fragmentation */
> > - heap_page_prune_execute(buffer,
> > + /*
> > + * Update all line pointers per the record, and repair fragmentation.
> > + * We always pass pronto_reap as true, because we don't know whether
> > + * or not this option was used when pruning. This reduces the
> > + * validation done on replay in an assert build.
> > + */
>
> Hm, that seems not great. Both because we loose validation and because it
> seems to invite problems down the line, due to pronto_reap falsely being set
> to true in heap_page_prune_execute().

I see what you are saying. With the name change to no_indexes, it
would be especially misleading to future programmers who might decide
to use that parameter for something else. Are you thinking it would be
okay to check the catalog to see if the table has indexes in
heap_xlog_prune() or are you suggesting that I add some kind of flag
to the prune WAL record?

- Melanie

Attachment Content-Type Size
v4-0004-Conslidate-FSM-update-code-in-lazy_scan_heap.patch text/x-patch 3.6 KB
v4-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patch text/x-patch 18.1 KB
v4-0002-Shift-VM-update-code-to-lazy_scan_prune.patch text/x-patch 10.9 KB
v4-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patch text/x-patch 13.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Geoff Winkless 2024-01-06 15:38:31 Re: weird GROUPING SETS and ORDER BY behaviour
Previous Message vignesh C 2024-01-06 15:34:54 Re: pg_stat_statements and "IN" conditions