Re: heavily contended lwlocks with long wait queues scale badly

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: heavily contended lwlocks with long wait queues scale badly
Date: 2022-11-01 15:59:04
Message-ID: 20221101155904.6qsgw2hufakyau3v@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-11-01 08:37:39 -0400, Robert Haas wrote:
> On Tue, Nov 1, 2022 at 3:17 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > Below are test results with v3 patch. +1 for back-patching it.
>
> The problem with back-patching stuff like this is that it can have
> unanticipated consequences. I think that the chances of something like
> this backfiring are less than for a patch that changes plans, but I
> don't think that they're nil, either. It could turn out that this
> patch, which has really promising results on the workloads we've
> tested, harms some other workload due to some other contention pattern
> we can't foresee. It could also turn out that improving performance at
> the database level actually has negative consequences for some
> application using the database, because the application could be
> unknowingly relying on the database to throttle its activity.
>
> It's hard for me to estimate exactly what the risk of a patch like
> this is. I think that if we back-patched this, and only this, perhaps
> the chances of something bad happening aren't incredibly high. But if
> we get into the habit of back-patching seemingly-innocuous performance
> improvements, it's only a matter of time before one of them turns out
> not to be so innocuous as we thought. I would guess that the number of
> times we have to back-patch something like this before somebody starts
> complaining about a regression is likely to be somewhere between 3 and
> 5.

In general I agree, we shouldn't default to backpatching performance
fixes. The reason I am even considering it in this case, is that it's a
readily reproducible issue, leading to a quadratic behaviour that's extremely
hard to pinpoint. There's no increase in CPU usage, no wait event for
spinlocks, the system doesn't even get stuck (because the wait list lock is
held after the lwlock lock release). I don't think users have a decent chance
at figuring out that this is the issue.

I'm not at all convinced we should backpatch either, just to be clear.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-01 16:01:45 Re: BUG #17663:Connect to the database through jdbc, call the stored procedure containing the rollback statement,the database triggers an assertion, and the database is in recovery mode.
Previous Message Mingli Zhang 2022-11-01 15:45:45 Re: [Refactor]Avoid to handle FORCE_NOT_NULL/FORCE_NULL options when COPY TO