Re: Use generation context to speed up tuplesorts

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tv(at)fuzzy(dot)cz>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use generation context to speed up tuplesorts
Date: 2021-08-06 13:07:27
Message-ID: CAApHDvqMyMQc9b-mBnGvqsudfVysgD4Xz7c7LsGrP524bsv47w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 4 Aug 2021 at 02:10, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> A review would be nice, although it can wait - It'd be interesting to
> know if those patches help with the workload(s) you've been looking at.

I tried out the v2 set of patches using the attached scripts. The
attached spreadsheet includes the original tests and compares master
with the patch which uses the generation context vs that patch plus
your v2 patch.

I've also included 4 additional tests, each of which starts with a 1
column table and then adds another 32 columns testing the performance
after adding each additional column. I did this because I wanted to
see if the performance was more similar to master when the allocations
had less power of 2 wastage from allocset. If, for example, you look
at row 123 of the spreadsheet you can see both patched and unpatched
the allocations were 272 bytes each yet there was still a 50%
performance improvement with just the generation context patch when
compared to master.

Looking at the spreadsheet, you'll also notice that in the 2 column
test of each of the 4 new tests the number of bytes used for each
allocation is larger with the generation context. 56 vs 48. This is
due to the GenerationChunk struct size being later than the Allocset's
version by 8 bytes. This is because it also holds the
GenerationBlock. So with the patch there are some cases where we'll
use slightly more memory.

Additional tests:

1. Sort 10000 tuples on a column with values 0-99 in memory.
2. As #1 but with 1 million tuples.
3 As #1 but with a large OFFSET to remove the overhead of sending to the client.
4. As #2 but with a large OFFSET.

Test #3 above is the most similar one to the original tests and shows
similar gains. When the sort becomes larger (1 million tuple test),
the gains reduce. This indicates the gains are coming from improved
CPU cache efficiency from the removal of the power of 2 wastage in
memory allocations.

All of the tests show that the patches to improve the allocation
efficiency of generation.c don't help to improve the results of the
test cases. I wondered if it's maybe worth trying to see what happens
if instead of doubling the allocations each time, quadruple them
instead. I didn't try this.

David

Attachment Content-Type Size
sortbench_1m.sh.txt text/plain 736 bytes
generation context tuplesort.ods application/vnd.oasis.opendocument.spreadsheet 67.1 KB
sortbench_10k.sh.txt text/plain 734 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-08-06 13:19:08 Re: Added schema level support for publication.
Previous Message Andrew Dunstan 2021-08-06 12:48:25 Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?