Re: Optimising compactify_tuples()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising compactify_tuples()
Date: 2020-09-08 00:07:51
Message-ID: CA+hUKGJLbeBqYij8zL7LV1bY6yPycO7rGbBwqcwgTSceh4-yQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 7, 2020 at 7:48 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I was wondering today if we could just get rid of the sort in
> compactify_tuples() completely. It seems to me that the existing sort
> is there just so that the memmove() is done in order of tuple at the
> end of the page first. We seem to be just shunting all the tuples to
> the end of the page so we need to sort the line items in reverse
> offset so as not to overwrite memory for other tuples during the copy.
>
> I wondered if we could get around that just by having another buffer
> somewhere and memcpy the tuples into that first then copy the tuples
> out that buffer back into the page. No need to worry about the order
> we do that in as there's no chance to overwrite memory belonging to
> other tuples.
>
> Doing that gives me 79.166 seconds in the same recovery test. Or about
> 46% faster, instead of 22% (I mistakenly wrote 28% yesterday)

Wow.

One thought is that if we're going to copy everything out and back in
again, we might want to consider doing it in a
memory-prefetcher-friendly order. Would it be a good idea to
rearrange the tuples to match line pointer order, so that the copying
work and also later sequential scans are in a forward direction? The
copying could also perhaps be done with single memcpy() for ranges of
adjacent tuples. Another thought is that it might be possible to
identify some easy cases that it can handle with an alternative
in-place shifting algorithm without having to go to the
copy-out-and-back-in path. For example, when the offset order already
matches line pointer order but some dead tuples just need to be
squeezed out by shifting ranges of adjacent tuples, and maybe some
slightly more complicated cases, but nothing requiring hard work like
sorting.

> (I don't want to derail the sort improvements here. I happen to think
> those are quite important improvements, so I'll continue to review
> that patch still. Longer term, we might just end up with something
> slightly different for compactify_tuples)

Yeah. Perhaps qsort specialisation needs to come back in a new thread
with a new use case.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-09-08 00:13:53 Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation
Previous Message Alvaro Herrera 2020-09-07 23:44:27 Re: default partition and concurrent attach partition