Re: Contention on LWLock buffer_content, due to SHARED lock(?)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jens-Wolfhard Schicke-Uffmann <drahflow(at)gmx(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Contention on LWLock buffer_content, due to SHARED lock(?)
Date: 2019-12-10 16:44:17
Message-ID: 20191210164417.gus4ctgtfwu6z66u@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-12-09 23:10:36 +0100, Jens-Wolfhard Schicke-Uffmann wrote:
> today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
> cores) lock contention on a buffer content lock due to taking of a
> SHARED lock (I think):

> Three tables were involved, simplified case:
>
> CREATE TABLE global_config (id BIGINT PRIMARY KEY);
>
> CREATE TABLE b (
> id BIGINT PRIMARY KEY,
> config_id BIGINT REFERENCES global_config (id)
> );
>
> CREATE TABLE c (
> id BIGINT PRIMARY KEY,
> config_id BIGINT REFERENCES global_config (id)
> );
>
> (I suppose having both b + c doesn't make a difference, but
> maybe it was relevant, so I'm including it.)
>
> Heavy INSERT + UPDATE traffic on b + c (all trivial via id),
> SELECTs on global_config (again by id).
> As the name suggests, there were only very few rows in
> global_config, specifically only one was referenced by all
> INSERT + UPDATEs on b + c.
>
> On lighter load, all three types of queries were taking <1ms (as
> expected), as load grew, all three went to ~50ms avg. execution time
> together. AWS RDS console showed wait on LWLock:buffer_content as the
> main contribution to that time.
>
> Checking the code, I concluded that I observed lock contention
> on the lock taken at the beginning of heap_lock_tuple, where
> an exclusive buffer content lock is held while recording the
> SHARE lock into the tuple and the WAL and the multiXact. I don't know
> the actual number, but potentially up to 7000 active
> transactions were holding a SHARE lock on that row, which could have
> performance implications while scanning for multiXact memberships.

When you say "7000 active transactions" - do you mean to say that you
have set max_connections to something higher than that, and you actually
have that many concurrent transactions?

> Semantically, all that lock traffic was superfluous, as the
> global_config row's key was in no danger of being changed.

Well, postgres can't know that.

> As this situation (some global, essentially static, entity is referenced
> by a much written table) seems not uncommon, I wonder:
>
> 1. Does the above analysis sound about right?

Hard to know without additional data.

> 2. If so, would it be worthwhile to develop a solution?

Possible, but I'm not sure it's worth the complexity.

I'd definitely like to see a proper reproducer and profile for this,
before investigating further.

> I was thinking along the lines of introducing an multiXact
> representation of "everyone": Instead of meticulously recording every
> locking + completing transaciton in a multiXact, after a certain
> number of transactions has accumulated in a single multiXact, it is
> approximated as "everyone". If later a transaction finds that a SHARE
> lock is held by "everyone", the tuple would need no further modification

I think the big problem with a strategy like this is that it's prone to
generate deadlocks that aren't present in the "original" scheduling.

> (not sure if this could even be checked without taking an exclusive
> buffer lock).

It should only require a share lock.

> The hard part would probably be to ensure that an
> attempt to obtain an EXCLUSIVE lock would finally succeed against a
> SHARE lock held by "everyone".

Note that this is a seriously complicated area of the code. It's very
easy to create new bugs that aren't easily testable. I think we'd need a
very convincing use-case for improvements around the problem you outline
and relatively simple solution, to counter stability concerns.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2019-12-10 17:13:47 RE: [Proposal] Level4 Warnings show many shadow vars
Previous Message Jeff Janes 2019-12-10 16:28:50 Re: Contention on LWLock buffer_content, due to SHARED lock(?)