Re: Use generation context to speed up tuplesorts

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tv(at)fuzzy(dot)cz>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use generation context to speed up tuplesorts
Date: 2021-08-07 00:10:35
Message-ID: 13808af0-2bb5-b506-62d0-1fb67e3385d0@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/6/21 3:07 PM, David Rowley wrote:
> On Wed, 4 Aug 2021 at 02:10, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> A review would be nice, although it can wait - It'd be interesting to
>> know if those patches help with the workload(s) you've been looking at.
>
> I tried out the v2 set of patches using the attached scripts. The
> attached spreadsheet includes the original tests and compares master
> with the patch which uses the generation context vs that patch plus
> your v2 patch.
>
> I've also included 4 additional tests, each of which starts with a 1
> column table and then adds another 32 columns testing the performance
> after adding each additional column. I did this because I wanted to
> see if the performance was more similar to master when the allocations
> had less power of 2 wastage from allocset. If, for example, you look
> at row 123 of the spreadsheet you can see both patched and unpatched
> the allocations were 272 bytes each yet there was still a 50%
> performance improvement with just the generation context patch when
> compared to master.
>
> Looking at the spreadsheet, you'll also notice that in the 2 column
> test of each of the 4 new tests the number of bytes used for each
> allocation is larger with the generation context. 56 vs 48. This is
> due to the GenerationChunk struct size being later than the Allocset's
> version by 8 bytes. This is because it also holds the
> GenerationBlock. So with the patch there are some cases where we'll
> use slightly more memory.
>
> Additional tests:
>
> 1. Sort 10000 tuples on a column with values 0-99 in memory.
> 2. As #1 but with 1 million tuples.
> 3 As #1 but with a large OFFSET to remove the overhead of sending to the client.
> 4. As #2 but with a large OFFSET.
>
> Test #3 above is the most similar one to the original tests and shows
> similar gains. When the sort becomes larger (1 million tuple test),
> the gains reduce. This indicates the gains are coming from improved
> CPU cache efficiency from the removal of the power of 2 wastage in
> memory allocations.
>
> All of the tests show that the patches to improve the allocation
> efficiency of generation.c don't help to improve the results of the
> test cases. I wondered if it's maybe worth trying to see what happens
> if instead of doubling the allocations each time, quadruple them
> instead. I didn't try this.
>

Thanks for the scripts and the spreadsheet with results.

I doubt quadrupling the allocations won't help very much, but I suspect
the problem might be in the 0004 patch - at least that's what shows
regression in my results. Could you try with just 0001-0003 applied?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-08-07 01:30:44 Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Previous Message Soumyadeep Chakraborty 2021-08-06 23:59:55 Re: Changes to recovery_min_apply_delay are ignored while waiting for delay