Re: Optimising compactify_tuples()

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising compactify_tuples()
Date: 2020-09-08 15:47:05
Message-ID: CAApHDvrM0P=YDKFYEJECHCH9zWycu10hCJ8TFuNXZd6=_BMyzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 8 Sep 2020 at 12:08, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> One thought is that if we're going to copy everything out and back in
> again, we might want to consider doing it in a
> memory-prefetcher-friendly order. Would it be a good idea to
> rearrange the tuples to match line pointer order, so that the copying
> work and also later sequential scans are in a forward direction?

That's an interesting idea but wouldn't that require both the copy to
the separate buffer *and* a qsort? That's the worst of both
implementations. We'd need some other data structure too in order to
get the index of the sorted array by reverse lineitem point, which
might require an additional array and an additional sort.

> The
> copying could also perhaps be done with single memcpy() for ranges of
> adjacent tuples.

I wonder if the additional code required to check for that would be
cheaper than the additional function call. If it was then it might be
worth trying, but since the tuples can be in any random order then
it's perhaps not likely to pay off that often. I'm not really sure
how often adjacent line items will also be neighbouring tuples for
pages we call compactify_tuples() for. It's certainly going to be
common with INSERT only tables, but if we're calling
compactify_tuples() then it's not read-only.

> Another thought is that it might be possible to
> identify some easy cases that it can handle with an alternative
> in-place shifting algorithm without having to go to the
> copy-out-and-back-in path. For example, when the offset order already
> matches line pointer order but some dead tuples just need to be
> squeezed out by shifting ranges of adjacent tuples, and maybe some
> slightly more complicated cases, but nothing requiring hard work like
> sorting.

It's likely worth experimenting. The only thing is that the workload
I'm using seems to end up with the tuples with line items not in the
same order as the tuple offset. So adding a precheck to check the
ordering will regress the test I'm doing. We'd need to see if there is
any other workload that would keep the tuples more in order then
determine how likely that is to occur in the real world.

> > (I don't want to derail the sort improvements here. I happen to think
> > those are quite important improvements, so I'll continue to review
> > that patch still. Longer term, we might just end up with something
> > slightly different for compactify_tuples)
>
> Yeah. Perhaps qsort specialisation needs to come back in a new thread
> with a new use case.

hmm, yeah, perhaps that's a better way given the subject here is about
compactify_tuples()

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-09-08 16:30:15 Re: Optimising compactify_tuples()
Previous Message Alexey Kondratov 2020-09-08 15:34:31 Re: [POC] Fast COPY FROM command for the table with foreign partitions