Re: PG15 beta1 sort performance regression due to Generation context change

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: PG15 beta1 sort performance regression due to Generation context change
Date: 2022-05-22 19:11:46
Message-ID: CAEudQAoApw7T3wCv89oCF7tksR6Y8i2YTyprZZaRgYo8snzS2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

>Over the past few days I've been gathering some benchmark results
>together to show the sort performance improvements in PG15 [1].

>One of the test cases I did was to demonstrate Heikki's change to use
>a k-way merge (65014000b).

>The test I did to try this out was along the lines of:

>set max_parallel_workers_per_gather = 0;
>create table t (a bigint not null, b bigint not null, c bigint not
>null, d bigint not null, e bigint not null, f bigint not null);

>insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x; --
10GB!
>vacuum freeze t;

>The query I ran was:

>select * from t order by a offset 140247142;
I redid this test here:
Windows 10 64 bits
msvc 2019 64 bits
RAM 8GB
SSD 256 GB

HEAD (default configuration)
Time: 229396,551 ms (03:49,397)
PATCHED:
Time: 220887,346 ms (03:40,887)

>I tested various sizes of work_mem starting at 4MB and doubled that
>all the way to 16GB. For many of the smaller values of work_mem the
>performance is vastly improved by Heikki's change, however for
>work_mem = 64MB I detected quite a large slowdown. PG14 took 20.9
>seconds and PG15 beta 1 took 29 seconds!

>I've been trying to get to the bottom of this today and finally have
>discovered this is due to the tuple size allocations in the sort being
>exactly 64 bytes. Prior to 40af10b57 (Use Generation memory contexts
>to store tuples in sorts) the tuple for the sort would be stored in an
>aset context. After 40af10b57 we'll use a generation context. The
>idea with that change is that the generation context does no
>power-of-2 round ups for allocations, so we save memory in most cases.
>However, due to this particular test having a tuple size of 64-bytes,
>there was no power-of-2 wastage with aset.

>The problem is that generation chunks have a larger chunk header than
>aset do due to having to store the block pointer that the chunk
>belongs to so that GenerationFree() can increment the nfree chunks in
>the block. aset.c does not require this as freed chunks just go onto a
>freelist that's global to the entire context.

>Basically, for my test query, the slowdown is because instead of being
>able to store 620702 tuples per tape over 226 tapes with an aset
>context, we can now only store 576845 tuples per tape resulting in
>requiring 244 tapes when using the generation context.

>If I had added column "g" to make the tuple size 72 bytes causing
>aset's code to round allocations up to 128 bytes and generation.c to
>maintain the 72 bytes then the sort would have stored 385805 tuples
>over 364 batches for aset and 538761 tuples over 261 batches using the
>generation context. That would have been a huge win.

>So it basically looks like I discovered a very bad case that causes a
>significant slowdown. Yet other cases that are not an exact power of
>2 stand to gain significantly from this change.

>One thing 40af10b57 does is stops those terrible performance jumps
>when the tuple size crosses a power-of-2 boundary. The performance
>should be more aligned to the size of the data being sorted now...
>Unfortunately, that seems to mean regressions for large sorts with
>power-of-2 sized tuples.

It seems to me that the solution would be to use aset allocations

when the size of the tuples is power-of-2?

if (state->sortopt & TUPLESORT_ALLOWBOUNDED ||
(state->memtupsize & (state->memtupsize - 1)) == 0)
state->tuplecontext = AllocSetContextCreate(state->sortcontext,
"Caller tuples", ALLOCSET_DEFAULT_SIZES);
else
state->tuplecontext = GenerationContextCreate(state->sortcontext,
"Caller tuples", ALLOCSET_DEFAULT_SIZES);

I took a look and tried some improvements to see if I had a better result.

Would you mind taking a look and testing?

regards,

Ranier Vilela

Attachment Content-Type Size
sort_bench.txt text/plain 10.1 KB
002-improve-sort.patch application/octet-stream 5.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-05-22 20:26:08 Re: docs: mention "pg_read_all_stats" in "track_activities" description
Previous Message Stephen Frost 2022-05-22 14:12:58 Re: Unsubscribing from this mailing list.