Re: Use generation context to speed up tuplesorts

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use generation context to speed up tuplesorts
Date: 2021-07-30 20:38:53
Message-ID: 20210730203853.utjj43f6zzn5e2hy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-07-30 18:42:18 +1200, David Rowley wrote:
> While reviewing and benchmarking 91e9e89dc (Make nodeSort.c use Datum
> sorts for single column sorts), I noticed that we use a separate
> memory context to store tuple data and we just reset when we're done
> if the sort fits in memory, or we dump the tuples to disk in the same
> order they're added and reset the context when it does not. There is
> a little pfree() work going on via writetup_heap() which I think
> possibly could just be removed to get some additional gains.
>
> Anyway, this context that stores tuples uses the standard aset.c
> allocator which has the usual power of 2 wastage and additional
> overheads of freelists etc. I wondered how much faster it would go if
> I set it to use a generation context instead of an aset.c one.
>
> After running make installcheck to make the tenk1 table, running the
> attached tuplesortbench script, I get this:

> So it seems to save quite a bit of memory getting away from rounding
> up allocations to the next power of 2.
>
> Performance-wise, there's some pretty good gains. (Results in TPS)

Very nice!

I wonder if there's cases where generation.c would regress performance
over aset.c due to not having an initial / "keeper" block?

> The patch is just a simple 1-liner at the moment. I likely do need to
> adjust what I'm passing as the blockSize to GenerationContextCreate().
> Maybe a better number would be something that's calculated from
> work_mem, e.g Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64))
> so that we just allocate at most a 16th of work_mem per chunk, but not
> bigger than 8MB. I don't think changing this will affect the
> performance of the above very much.

I think it's bad that both genereration and slab don't have internal
handling of block sizes. Needing to err on the size of too big blocks to
handle large amounts of memory well, just so the contexts don't need to
deal with variably sized blocks isn't a sensible tradeoff.

I don't think it's acceptable to use ALLOCSET_DEFAULT_MAXSIZE or
Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64) for
tuplesort.c. There's plenty cases where we'll just sort a handful of
tuples, and increasing the memory usage of those by a factor of 1024
isn't good. The Min() won't do any good if a larger work_mem is used.

Nor will it be good to use thousands of small allocations for a large
in-memory tuplesort just because we're concerned about the initial
allocation size. Both because of the allocation overhead, but
importantly also because that will make context resets more expensive.

To me this says that we should transplant aset.c's block size growing
into generation.c.

There is at least one path using tuplecontext that reaches code that
could end up freeing memory to a significant enough degree to care about
generation.c effectively not using that memory:
tuplesort_putdatum()->datumCopy()->EOH_flatten_into()
On a quick look I didn't find any expanded record user that frees
nontrivial amounts of memory, but I didn't look all that carefully.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-07-30 20:44:06 Re: Have I found an interval arithmetic bug?
Previous Message Melanie Plageman 2021-07-30 20:34:34 Re: Parallel Full Hash Join