Re: Optimising compactify_tuples()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising compactify_tuples()
Date: 2020-09-09 22:39:56
Message-ID: CA+hUKGLhcvXFMvAG6O1UsE2sN2CUkMBNwdOFAHGEv4Szco_yXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 10, 2020 at 2:34 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I think you were adequately caffeinated. You're right that this is
> fairly simple to do, but it looks even more simple than looping twice
> of the array. I think it's just a matter of looping over the
> itemidbase backwards and putting the higher itemid tuples at the end
> of the page. I've done it this way in the attached patch.

Yeah, I was wondering how to make as much of the algorithm work in a
memory-forwards direction as possible (even the item pointer access),
but it was just a hunch. Once you have the adjacent-tuple merging
thing so you're down to just a couple of big memcpy calls, it's
probably moot anyway.

> I also added a presorted path which falls back on doing memmoves
> without the temp buffer when the itemidbase array indicates that
> higher lineitems all have higher offsets. I'm doing the presort check
> in the calling function since that loops over the lineitems already.
> We can just memmove the tuples in reverse order without overwriting
> any yet to be moved tuples when these are in order.

Great.

I wonder if we could also identify a range at the high end that is
already correctly sorted and maximally compacted so it doesn't even
need to be copied out.

+ * Do the tuple compactification. Collapse memmove calls for adjacent
+ * tuples.

s/memmove/memcpy/

> With the attached v3, performance is better. The test now runs
> recovery 65.6 seconds, vs master's 148.5 seconds. So about 2.2x
> faster.

On my machine I'm seeing 57s, down from 86s on unpatched master, for
the simple pgbench workload from
https://github.com/macdice/redo-bench/. That's not quite what you're
reporting but it still blows the doors off the faster sorting patch,
which does it in 74s.

> We should probably consider what else can be done to try to write
> pages with tuples for earlier lineitems earlier in the page. VACUUM
> FULLs and friends will switch back to the opposite order when
> rewriting the heap.

Yeah, and also bulk inserts/COPY. Ultimately if we flipped our page
format on its head that'd come for free, but that'd be a bigger
project with more ramifications.

So one question is whether we want to do the order-reversing as part
of this patch, or wait for a more joined-up project to make lots of
code paths collude on making scan order match memory order
(corellation = 1). Most or all of the gain from your patch would
presumably still apply if did the exact opposite and forced offset
order to match reverse-item ID order (correlation = -1), which also
happens to be the initial state when you insert tuples today; you'd
still tend towards a state that allows nice big memmov/memcpy calls
during page compaction. IIUC currently we start with correlation -1
and then tend towards correlation = 0 after many random updates
because we can't change the order, so it gets scrambled over time.
I'm not sure what I think about that.

PS You might as well post future patches with .patch endings so that
the cfbot tests them; it seems pretty clear now that a patch to
optimise sorting (as useful as it may be for future work) can't beat a
patch to skip it completely. I took the liberty of switching the
author and review names in the commitfest entry to reflect this.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-09-09 23:07:00 Re: v13: show extended stats target in \d
Previous Message Alvaro Herrera 2020-09-09 22:22:30 Re: v13: show extended stats target in \d