Re: Use generation context to speed up tuplesorts

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use generation context to speed up tuplesorts
Date: 2021-07-30 21:13:26
Message-ID: bc765deb-ca2b-838e-f980-16a77dd0922b@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/30/21 10:38 PM, Andres Freund wrote:
> Hi,
>
> On 2021-07-30 18:42:18 +1200, David Rowley wrote:
>> While reviewing and benchmarking 91e9e89dc (Make nodeSort.c use Datum
>> sorts for single column sorts), I noticed that we use a separate
>> memory context to store tuple data and we just reset when we're done
>> if the sort fits in memory, or we dump the tuples to disk in the same
>> order they're added and reset the context when it does not. There is
>> a little pfree() work going on via writetup_heap() which I think
>> possibly could just be removed to get some additional gains.
>>
>> Anyway, this context that stores tuples uses the standard aset.c
>> allocator which has the usual power of 2 wastage and additional
>> overheads of freelists etc. I wondered how much faster it would go if
>> I set it to use a generation context instead of an aset.c one.
>>
>> After running make installcheck to make the tenk1 table, running the
>> attached tuplesortbench script, I get this:
>
>> So it seems to save quite a bit of memory getting away from rounding
>> up allocations to the next power of 2.
>>
>> Performance-wise, there's some pretty good gains. (Results in TPS)
>
> Very nice!
>

Yes, very nice. I wouldn't have expected such significant difference,
particularly in CPU usage. It's pretty interesting that it both reduces
memory and CPU usage, I'd have guessed it's either one of the other.

>
> I wonder if there's cases where generation.c would regress performance
> over aset.c due to not having an initial / "keeper" block?
>

Not sure. I guess such workload would need to allocate and free a single
block (so very little memory) very often. I guess that's possible, but
I'm not aware of a place doing that very often. Although, maybe decoding
could do that for simple (serial) workload.

I'm not opposed to adding a keeper block to Generation, similarly to
what was discussed for Slab not too long ago.

>
>> The patch is just a simple 1-liner at the moment. I likely do need to
>> adjust what I'm passing as the blockSize to GenerationContextCreate().
>> Maybe a better number would be something that's calculated from
>> work_mem, e.g Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64))
>> so that we just allocate at most a 16th of work_mem per chunk, but not
>> bigger than 8MB. I don't think changing this will affect the
>> performance of the above very much.
>
> I think it's bad that both genereration and slab don't have internal
> handling of block sizes. Needing to err on the size of too big blocks to
> handle large amounts of memory well, just so the contexts don't need to
> deal with variably sized blocks isn't a sensible tradeoff.
>

Well, back then it seemed like a sensible trade off to me, but I agree
it may have negative consequences. I'm not opposed to revisiting this.

> I don't think it's acceptable to use ALLOCSET_DEFAULT_MAXSIZE or
> Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64) for
> tuplesort.c. There's plenty cases where we'll just sort a handful of
> tuples, and increasing the memory usage of those by a factor of 1024
> isn't good. The Min() won't do any good if a larger work_mem is used.
>
> Nor will it be good to use thousands of small allocations for a large
> in-memory tuplesort just because we're concerned about the initial
> allocation size. Both because of the allocation overhead, but
> importantly also because that will make context resets more expensive.
>

True.

> To me this says that we should transplant aset.c's block size growing
> into generation.c.
>

Yeah, maybe.

>
> There is at least one path using tuplecontext that reaches code that
> could end up freeing memory to a significant enough degree to care about
> generation.c effectively not using that memory:
> tuplesort_putdatum()->datumCopy()->EOH_flatten_into()
> On a quick look I didn't find any expanded record user that frees
> nontrivial amounts of memory, but I didn't look all that carefully.
>

Not sure, I'm not familiar with EOH_flatten_into or expanded records.
But I wonder if there's some sort of metric that we could track in
Generation and use it to identify "interesting" places.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-07-30 21:14:07 Re: [PATCH] Hooks at XactCommand level
Previous Message Tom Lane 2021-07-30 21:00:30 Re: Corrected documentation of data type for the logical replication message formats.