Re: Use generation context to speed up tuplesorts

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: Use generation context to speed up tuplesorts
Date: 2022-01-20 07:36:46
Message-ID: 4377612.LvFx2qVVIh@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le jeudi 20 janvier 2022, 02:31:15 CET David Rowley a écrit :
> On Tue, 18 Jan 2022 at 19:45, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > I'm also a bit confused about which patch(es) should actually be reviewed
> > in that thread. My understanding is that the original patch might not be
> > relevant anymore but I might be wrong. The CF entry should probably be
> > updated to clarify that, with an annotation/ title change / author update
> > or something.
> >
> > In the meantime I will switch the entry to Waiting on Author.
> I think what's going on here is a bit confusing. There's quite a few
> patches on here that aim to do very different things.
> The patch I originally proposed was just to make tuplesort use
> generation memory contexts. This helps speed up tuplesort because
> generation contexts allow palloc() to be faster than the standard
> allocator. Additionally, there's no power-of-2 wastage with generation
> contexts like there is with the standard allocator. This helps fit
> more tuples on fewer CPU cache lines.
> I believe Andres then mentioned that the fixed block size for
> generation contexts might become a problem. With the standard
> allocator the actual size of the underlying malloc() grows up until a
> maximum size. With generation contexts this is always the same, so we
> could end up with a very large number of blocks which means lots of
> small mallocs which could end up slow. Tomas then posted a series of
> patches to address this.
> I then posted another patch that has the planner make a choice on the
> size of the blocks to use for the generation context based on the
> tuple width and number of tuple estimates. My hope there was to
> improve performance further by making a better choice for how large to
> make the blocks in the first place. This reduces the need for Tomas'
> patch, but does not eliminate the need for it.
> As of now, I still believe we'll need Tomas' patches to allow the
> block size to grow up to a maximum size. I think those patches are
> likely needed before we think about making tuplesort use generation
> contexts. The reason I believe this is that we don't always have good
> estimates for the number of tuples we're going to sort. nodeSort.c is
> fairly easy as it just fetches tuples once and then sorts them. The
> use of tuplesort for nodeIncrementalSort.c is much more complex as
> we'll likely be performing a series of smaller sorts which are much
> harder to get size estimates for. This means there's likely no magic
> block size that we can use for the generation context that's used to
> store the tuples in tuplesort. This is also the case for order by
> aggregates and probably most other usages of tuplesort.

You left out the discussion I started about glibc's malloc specific behaviour.

I tried to demonstrate that a big part of the performance gain you were seeing
were not in fact due to our allocator by itself, but by the way different block
sizes allocations interact with this malloc implementation and how it handles
releasing memory back to the system. I also tried to show that we can give
DBAs more control over how much memory to "waste" as the price for faster

I agree that the generation context memory manager is useful in the tuplesort
context, if only for the fact that we fall back to disk less quickly due to
the reduced wastage of memory, but I would be wary of drawing conclusions too
quickly based on a specific malloc implementation which shows threshold effects,
which are compounded by our growing block sizes code in general.

I have on my TODO list to run the same kind of benchmarks using jemalloc, to
better isolate the performance gains expected from the generation allocator
itself from the side effect of avoiding glibc's pattern of releasing memory
back to the kernel too quickly.

Julien, sorry for the confusion of adding that discussion and those patches
to the same thread, but I don't really see a way of dissociating those two

Ronan Dunklau

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message 2022-01-20 07:37:18 RE: Support tab completion for upper character inputs in psql
Previous Message Michael Paquier 2022-01-20 07:03:02 Re: Refactoring of compression options in pg_basebackup