Re: Use generation context to speed up tuplesorts

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: Use generation context to speed up tuplesorts
Date: 2022-02-12 20:56:09
Message-ID: d24b7d97-7a8f-7416-b932-2c0a781c2409@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/20/22 08:36, Ronan Dunklau wrote:
> Le jeudi 20 janvier 2022, 02:31:15 CET David Rowley a écrit :
>> On Tue, 18 Jan 2022 at 19:45, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>>> I'm also a bit confused about which patch(es) should actually be reviewed
>>> in that thread. My understanding is that the original patch might not be
>>> relevant anymore but I might be wrong. The CF entry should probably be
>>> updated to clarify that, with an annotation/ title change / author update
>>> or something.
>>>
>>> In the meantime I will switch the entry to Waiting on Author.
>>
>> I think what's going on here is a bit confusing. There's quite a few
>> patches on here that aim to do very different things.
>>
>> The patch I originally proposed was just to make tuplesort use
>> generation memory contexts. This helps speed up tuplesort because
>> generation contexts allow palloc() to be faster than the standard
>> allocator. Additionally, there's no power-of-2 wastage with generation
>> contexts like there is with the standard allocator. This helps fit
>> more tuples on fewer CPU cache lines.
>>
>> I believe Andres then mentioned that the fixed block size for
>> generation contexts might become a problem. With the standard
>> allocator the actual size of the underlying malloc() grows up until a
>> maximum size. With generation contexts this is always the same, so we
>> could end up with a very large number of blocks which means lots of
>> small mallocs which could end up slow. Tomas then posted a series of
>> patches to address this.
>>
>> I then posted another patch that has the planner make a choice on the
>> size of the blocks to use for the generation context based on the
>> tuple width and number of tuple estimates. My hope there was to
>> improve performance further by making a better choice for how large to
>> make the blocks in the first place. This reduces the need for Tomas'
>> patch, but does not eliminate the need for it.
>>
>> As of now, I still believe we'll need Tomas' patches to allow the
>> block size to grow up to a maximum size. I think those patches are
>> likely needed before we think about making tuplesort use generation
>> contexts. The reason I believe this is that we don't always have good
>> estimates for the number of tuples we're going to sort. nodeSort.c is
>> fairly easy as it just fetches tuples once and then sorts them. The
>> use of tuplesort for nodeIncrementalSort.c is much more complex as
>> we'll likely be performing a series of smaller sorts which are much
>> harder to get size estimates for. This means there's likely no magic
>> block size that we can use for the generation context that's used to
>> store the tuples in tuplesort. This is also the case for order by
>> aggregates and probably most other usages of tuplesort.
>>
>
> You left out the discussion I started about glibc's malloc specific behaviour.
>
> I tried to demonstrate that a big part of the performance gain you were seeing
> were not in fact due to our allocator by itself, but by the way different block
> sizes allocations interact with this malloc implementation and how it handles
> releasing memory back to the system. I also tried to show that we can give
> DBAs more control over how much memory to "waste" as the price for faster
> allocations.
>
> I agree that the generation context memory manager is useful in the tuplesort
> context, if only for the fact that we fall back to disk less quickly due to
> the reduced wastage of memory, but I would be wary of drawing conclusions too
> quickly based on a specific malloc implementation which shows threshold effects,
> which are compounded by our growing block sizes code in general.
>

Yeah, I share this concern - how much of the improvement is real and how
much is due to accidentally not hitting some glibc-specific self-tuning
stuff? Would a realistic workloads get the same improvement or would it
cause regression? What about non-glibc systems with other malloc?

I'm not against pushing the generation context improvements, e.g.
something like the patches posted in [1], because those seem reasonable
in general. But I'm somewhat confused about the last patch (adjusting
allocChunkLimit) causing fairly significant regressions.

I'll try investigating this a bit more next week.

regards

[1]
https://www.postgresql.org/message-id/flat/bcdd4e3e-c12d-cd2b-7ead-a91ad416100a%40enterprisedb.com#a5ecd4e5a9e7d5b07ff25b5684da894f

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-02-12 21:12:21 Re: refactoring basebackup.c
Previous Message Tomas Vondra 2022-02-12 20:44:32 Re: Use generation context to speed up tuplesorts