Re: PG15 beta1 sort performance regression due to Generation context change

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: PG15 beta1 sort performance regression due to Generation context change
Date: 2022-07-15 23:27:47
Message-ID: 1ebabe9a-0345-d460-9cd1-43ccb843d9d5@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the very detailed analysis. Comments inline.

On 7/15/22 7:12 PM, David Rowley wrote:
> On Sat, 16 Jul 2022 at 10:40, Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
>> What I find interesting is the resistance to adding any documentation
>> around this feature to guide users in case they hit the regression. I
>> understand it can be difficult to provide guidance on issues related to
>> adjusting work_mem, but even just a hint in the release notes to say "if
>> you see a performance regression you may need to adjust work_mem" would
>> be helpful. This would help people who are planning upgrades to at least
>> know what to watch out for.
>
> Looking back at the final graph in the blog [1], l see that work_mem
> is a pretty surprising GUC. I'm sure many people would expect that
> setting work_mem to some size that allows the sort to be entirely done
> in RAM would be the fastest way. And that does appear to be the case,
> as 16GB was the only setting which allowed that. However, I bet it
> would surprise many people to see that 8GB wasn't 2nd fastest. Even
> 128MB was faster than 8GB!

Yeah that is interesting. And while some of those settings are less
likely in the wild, I do think we are going to see larger and larger
"work_mem" settings as instance sizes continue to grow. That said, your
PG15 benchmarks are overall faster than the PG14, and that is what I am
looking at in the context of this release.

> Most likely that's because the machine I tested that on has lots of
> RAM spare for kernel buffers which would allow all that disk activity
> for batching not actually to cause physical reads or writes. I bet
> that would have looked different if I'd run a few concurrent sorts
> with 128MB of work_mem. They'd all be competing for kernel buffers in
> that case.
>
> So I agree with Andres here. It seems weird to me to try to document
> this new thing that I caused when we don't really make any attempt to
> document all the other weird stuff with work_mem.

I can't argue with this.

My note on the documentation was primarily around to seeing countless
user issues post-upgrade where queries that "once performed well no
longer do so." I want to ensure that our users at least have a starting
point to work on resolving the issues, even if they end up being very
nuanced.

Perhaps a next step (and a separate step from this) is to assess the
guidance we give on the upgrade page[1] about some common things they
should check for. Then we can have the "boilerplate" there.

> I think the problem can actually be worse with work_mem sizes in
> regards to hash tables. The probing phase of a hash join causes
> memory access patterns that the CPU cannot determine which can result
> in poor performance when the hash table size is larger than the CPU's
> L3 cache size. If you have fast enough disks, it seems realistic that
> given the right workload (most likely much more than 1 probe per
> bucket) that you could also get better performance by having lower
> values of work_mem.
>
> If we're going to document the generic context anomaly then we should
> go all out and document all of the above, plus all the other weird
> stuff I've not thought of. However, I think, short of having an
> actual patch to review, it might be better to leave it until someone
> can come up with some text that's comprehensive enough to be worthy of
> reading. I don't think I could do the topic justice. I'm also not
> sure any wisdom we write about this would be of much use in the real
> world given that its likely concurrency has a larger effect, and we
> don't have much ability to control that.

Understood. I don't think that is fair to ask for this release, but
don't sell your short on explaining the work_mem nuances.

> FWIW, I think it would be better for us just to solve these problems
> in code instead. Having memory gating to control the work_mem from a
> pool and teaching sort about CPU caches might be better than
> explaining to users that tuning work_mem is hard.

+1. Again thank you for taking the time for the thorough explanation and
of course, working on the patch and fixes.

Jonathan

[1] https://www.postgresql.org/docs/current/upgrading.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2022-07-15 23:42:03 Re: [Commitfest 2022-07] Begins Now
Previous Message Jacob Champion 2022-07-15 23:16:46 Moving RwF patches to a new CF