Re: heavily contended lwlocks with long wait queues scale badly

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: heavily contended lwlocks with long wait queues scale badly
Date: 2022-11-01 17:41:23
Message-ID: 20221101174123.5scswkltxnjozknk@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-11-01 11:19:02 -0400, Jonathan S. Katz wrote:
> This is the type of fix that would make headlines in a major release
> announcement (10x TPS improvement w/4096 connections?!). That is also part
> of the tradeoff of backpatching this, is that we may lose some of the higher
> visibility marketing opportunities to discuss this (though I'm sure there
> will be plenty of blog posts, etc.)

(read the next paragraph with the caveat that results below prove it somewhat
wrong)

I don't think the fix is as big a deal as the above make it sound - you need
to do somewhat extreme things to hit the problem. Yes, it drastically improves
the scalability of e.g. doing SELECT txid_current() across as many sessions as
possible - but that's not something you normally do (it was a good candidate
to show the problem because it's a single lock but doesn't trigger WAL flushes
at commit).

You can probably hit the problem with many concurrent single-tx INSERTs, but
you'd need to have synchronous_commit=off or fsync=off (or a very expensive
server class SSD with battery backup) and the effect is likely smaller.

> Andres: when you suggested backpatching, were you thinking of the Nov 2022
> release or the Feb 2023 release?

I wasn't thinking that concretely. Even if we decide to backpatch, I'd be very
hesitant to do it in a few days.

<goes and runs test while in meeting>

I tested with browser etc running, so this is plenty noisy. I used the best of
the two pgbench -T21 -P5 tps, after ignoring the first two periods (they're
too noisy). I used an ok-ish NVMe SSD, rather than the the expensive one that
has "free" fsync.

synchronous_commit=on:

clients master fix
16 6196 6202
64 25716 25545
256 90131 90240
1024 128556 151487
2048 59417 157050
4096 32252 178823

synchronous_commit=off:

clients master fix
16 409828 409016
64 454257 455804
256 304175 452160
1024 135081 334979
2048 66124 291582
4096 27019 245701

Hm. That's a bigger effect than I anticipated. I guess sc=off isn't actually
required, due to the level of concurrency making group commit very
effective.

This is without an index, serial column or anything. But a quick comparison
for just 4096 clients shows that to still be a big difference if I create an
serial primary key:
master: 26172
fix: 155813

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2022-11-01 17:55:49 Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
Previous Message Jacob Champion 2022-11-01 17:03:29 Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert