Re: PG15 beta1 sort performance regression due to Generation context change

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PG15 beta1 sort performance regression due to Generation context change
Date: 2022-06-03 03:13:43
Message-ID: CAApHDvowHNSVLhMc0cnovg8PfnYQZxit-gP_bn3xkT4rZX3G0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 1 Jun 2022 at 03:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Right now my vote would be to leave things as they stand for v15 ---
> the performance loss that started this thread occurs in a narrow
> enough set of circumstances that I don't feel too much angst about
> it being the price of winning in most other circumstances. We can
> investigate these options at leisure for v16 or later.

I've been hesitating a little to put my views here as I wanted to see
what the other views were first. My thoughts are generally in
agreement with you, i.e., to do nothing for PG15 about this. My
reasoning is:

1. Most cases are faster as a result of using generation contexts for sorting.
2. The slowdown cases seem rare and the speedup cases are much more common.
3. There were performance cliffs in PG14 if a column was added to a
table to make the tuple size cross a power-of-2 boundary which I don't
recall anyone complaining about. PG15 makes the performance drop more
gradual as tuple sizes increase. Performance is more predictable as a
result.
4. As I just demonstrated in [1], if anyone is caught by this and has
a problem, the work_mem size increase required seems very small to get
performance back to better than in PG14. I found that setting work_mem
to 64.3MB makes PG15 faster than PG14 for the problem case. If anyone
happened to hit this case and find the performance regression
unacceptable then they have a way out... increase work_mem a little.

Also, in terms of what we might do to improve this situation for PG16:
I was also discussing this off-list with Andres which resulted in him
prototyping a patch [2] to store the memory context type in 3-bits in
the 64-bits prior to the pointer which is used to lookup a memory
context method table so that we can call the correct function. I've
been hacking around with this and I've added some optimisations and
got the memory allocation test [3] (modified to use aset.c rather than
generation.c) showing very promising results when comparing this patch
to master.

There are still a few slowdowns, but 16-byte allocations up to
256-bytes allocations are looking pretty good. Up to ~10% faster
compared to master.

(lower is better)

size compare
8 114.86%
16 89.04%
32 90.95%
64 94.17%
128 93.36%
256 96.57%
512 101.25%
1024 109.88%
2048 100.87%

There's quite a bit more work to do for deciding how to handle large
allocations and there's also likely more than can be done to further
shrink the existing chunk headers for each of the 3 existing memory
allocators.

David

[1] https://www.postgresql.org/message-id/CAApHDvq8MoEMxHN+f=RcCfwCfr30An1w3uOKruUnnPLVRR3c_A@mail.gmail.com
[2] https://github.com/anarazel/postgres/tree/mctx-chunk
[3] https://www.postgresql.org/message-id/attachment/134021/allocate_performance_function.patch

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-06-03 03:29:04 Re: pg_upgrade test writes to source directory
Previous Message Andres Freund 2022-06-03 02:35:11 Re: pgsql: Use pre-fetching for ANALYZE