Re: heavily contended lwlocks with long wait queues scale badly

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: heavily contended lwlocks with long wait queues scale badly
Date: 2022-11-01 15:19:02
Message-ID: 725d5089-11e6-93c8-b962-67c40240451f@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/1/22 8:37 AM, Robert Haas wrote:
> On Tue, Nov 1, 2022 at 3:17 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>> Below are test results with v3 patch. +1 for back-patching it.

First, awesome find and proposed solution!

> The problem with back-patching stuff like this is that it can have
> unanticipated consequences. I think that the chances of something like
> this backfiring are less than for a patch that changes plans, but I
> don't think that they're nil, either. It could turn out that this
> patch, which has really promising results on the workloads we've
> tested, harms some other workload due to some other contention pattern
> we can't foresee. It could also turn out that improving performance at
> the database level actually has negative consequences for some
> application using the database, because the application could be
> unknowingly relying on the database to throttle its activity.

If someone is using the database to throttle activity for their app, I
have a bunch of follow up questions to understand why.

> It's hard for me to estimate exactly what the risk of a patch like
> this is. I think that if we back-patched this, and only this, perhaps
> the chances of something bad happening aren't incredibly high. But if
> we get into the habit of back-patching seemingly-innocuous performance
> improvements, it's only a matter of time before one of them turns out
> not to be so innocuous as we thought. I would guess that the number of
> times we have to back-patch something like this before somebody starts
> complaining about a regression is likely to be somewhere between 3 and
> 5.

Having the privilege of reading through the release notes for every
update release, on average 1-2 "performance improvements" in each
release. I believe they tend to be more negligible, though.

I do understand the concerns. Say you discover your workload does have a
regression with this patch and then there's a CVE that you want to
accept -- what do you do? Reading the thread / patch, it seems as if
this is a lower risk "performance fix", but still nonzero.

While this does affect all supported versions, we could also consider
backpatching only for PG15. That at least 1/ limits impact on users
running older versions (opting into a major version upgrade) and 2/
we're still very early in the major upgrade cycle for PG15 that it's
lower risk if there are issues.

Users are generally happy when they can perform a simple upgrade and get
a performance boost, particularly the set of users that this patch
affects most (high throughput, high connection count). This is the type
of fix that would make headlines in a major release announcement (10x
TPS improvement w/4096 connections?!). That is also part of the tradeoff
of backpatching this, is that we may lose some of the higher visibility
marketing opportunities to discuss this (though I'm sure there will be
plenty of blog posts, etc.)

Andres: when you suggested backpatching, were you thinking of the Nov
2022 release or the Feb 2023 release?

Thanks,

Jonathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-11-01 15:42:51 Re: ResourceOwner refactoring
Previous Message Justin Pryzby 2022-11-01 13:33:41 Re: Direct I/O