Re: Combine Prune and Freeze records emitted by vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Combine Prune and Freeze records emitted by vacuum
Date: 2024-04-01 16:08:56
Message-ID: 20240401160856.sukkjbbyee7e7kcp@liskov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 01, 2024 at 05:17:51PM +0300, Heikki Linnakangas wrote:
> On 30/03/2024 07:57, Melanie Plageman wrote:
> > On Fri, Mar 29, 2024 at 12:32:21PM -0400, Melanie Plageman wrote:
> > > On Fri, Mar 29, 2024 at 12:00 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > > > Here's another idea: In the first loop through the offsets, where we
> > > > gather the HTSV status of each item, also collect the offsets of all HOT
> > > > and non-HOT items to two separate arrays. Call heap_prune_chain() for
> > > > all the non-HOT items first, and then process any remaining HOT tuples
> > > > that haven't been marked yet.
> > >
> > > That's an interesting idea. I'll try it out and see how it works.
> >
> > Attached v10 implements this method of dividing tuples into HOT and
> > non-HOT and processing the potential HOT chains first then processing
> > tuples not marked by calling heap_prune_chain().
> >
> > I have applied the refactoring of heap_prune_chain() to master and then
> > built the other patches on top of that.
> Committed some of the changes. Continuing to reviewing the rest.


> > The early patches in the set include some additional comment cleanup as
> > well. 0001 is fairly polished. 0004 could use some variable renaming
> > (this patch partitions the tuples into HOT and not HOT and then
> > processes them). I was struggling with some of the names here
> > (chainmembers and chaincandidates is confusing).
> I didn't understand why you wanted to juggle both partitions in the same
> array. So I separated them into two arrays, and called them 'root_items' and
> 'heaponly_items'.

I thought it was worth it to save the space. And the algorithm for doing
it seemed pretty straightforward. But looking at your patch, it is a lot
easier to understand with two arrays (since, for example, they can each
have a name).

> In some micro-benchmarks, the order that the items were processed made a
> measurable difference. So I'm processing the items in the reverse order.
> That roughly matches the order the items are processed in master, as it
> iterates the offsets from high-to-low in the first loop, and low-to-high in
> the second loop.

This makes sense. I noticed you didn't there isn't a comment about this
above the loop. It might be worth it to mention it.

Below is a review of only 0001. I'll look at the others shortly.

> From a6ab891779876e7cc1b4fb6fddb09f52f0094646 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Mon, 1 Apr 2024 16:59:38 +0300
> Subject: [PATCH v11 1/7] Handle non-chain tuples outside of heap_prune_chain()
> ---
> src/backend/access/heap/pruneheap.c | 264 +++++++++++++++++-----------
> 1 file changed, 166 insertions(+), 98 deletions(-)
> diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
> @@ -256,15 +270,16 @@ heap_page_prune(Relation relation, Buffer buffer,
> tup.t_tableOid = RelationGetRelid(relation);
> /*
> - * Determine HTSV for all tuples.
> + * Determine HTSV for all tuples, and queue them up for processing as HOT
> + * chain roots or as a heap-only items.

Reading this comment now as a whole, I would add something like
"Determining HTSV for all tuples once is required for correctness" to
the start of the second paragraph. The new conjunction on the first
paragraph sentence followed by the next paragraph is a bit confusing
because it sounds like queuing them up for processing is required for
correctness (which, I suppose it is on some level). Basically, I'm just
saying that it is now less clear what is required for correctness.

> * This is required for correctness to deal with cases where running HTSV
> * twice could result in different results (e.g. RECENTLY_DEAD can turn to
> * DEAD if another checked item causes GlobalVisTestIsRemovableFullXid()
> * to update the horizon, INSERT_IN_PROGRESS can change to DEAD if the
> - * inserting transaction aborts, ...). That in turn could cause
> - * heap_prune_chain() to behave incorrectly if a tuple is reached twice,
> - * once directly via a heap_prune_chain() and once following a HOT chain.
> + * inserting transaction aborts, ...). VACUUM assumes that there are no
> + * normal DEAD tuples left on the page after pruning, so it needs to have
> + * the same understanding of what is DEAD and what is not.
> *
> * It's also good for performance. Most commonly tuples within a page are
> * stored at decreasing offsets (while the items are stored at increasing
> @@ -282,52 +297,140 @@ heap_page_prune(Relation relation, Buffer buffer,

> + /*
> + * Process any heap-only tuples that were not already processed as part of
> + * a HOT chain.
> + */

While I recognize this is a matter of style and not important, I
personally prefer this for reverse looping:

for (int i = prstate.nheaponly_items; i --> 0;)

I do think a comment about the reverse order would be nice. I know it
says something above the first loop to this effect:

* Processing the items in reverse order (and thus the tuples in
* increasing order) increases prefetching efficiency significantly /
* decreases the number of cache misses.

So perhaps we could just say "as above, process the items in reverse

- Melanie

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2024-04-01 16:09:55 Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Previous Message Tristan Partin 2024-04-01 16:04:00 Re: psql not responding to SIGINT upon db reconnection