Re: heavily contended lwlocks with long wait queues scale badly

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: heavily contended lwlocks with long wait queues scale badly
Date: 2022-10-31 10:51:06
Message-ID: CALj2ACWGHuuk0OmpEW8Kd93W1kYejia0PgfYj7wP700VhUqV8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 27, 2022 at 10:29 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> But I think we can solve that fairly reasonably nonetheless. We can change
> PGPROC->lwWaiting to not just be a boolean, but have three states:
> 0: not waiting
> 1: waiting in waitlist
> 2: waiting to be woken up
>
> which we then can use in LWLockDequeueSelf() to only remove ourselves from the
> list if we're on it. As removal from that list is protected by the wait list
> lock, there's no race to worry about.
>
> client patched HEAD
> 1 60109 60174
> 2 112694 116169
> 4 214287 208119
> 8 377459 373685
> 16 524132 515247
> 32 565772 554726
> 64 587716 497508
> 128 581297 415097
> 256 550296 334923
> 512 486207 243679
> 768 449673 192959
> 1024 410836 157734
> 2048 326224 82904
> 4096 250252 32007
>
> Not perfect with the patch, but not awful either.

Here are results from my testing [1]. Results look impressive with the
patch at a higher number of clients, for instance, on HEAD TPS with
1024 clients is 103587 whereas it is 248702 with the patch.

HEAD, run 1:
1 34534
2 72088
4 135249
8 213045
16 243507
32 304108
64 375148
128 390658
256 345503
512 284510
768 146417
1024 103587
2048 34702
4096 12450

HEAD, run 2:
1 34110
2 72403
4 134421
8 211263
16 241606
32 295198
64 353580
128 385147
256 341672
512 295001
768 142341
1024 97721
2048 30229
4096 13179

PATCHED, run 1:
1 34412
2 71733
4 139141
8 211526
16 241692
32 308198
64 406198
128 385643
256 338464
512 295559
768 272639
1024 248702
2048 191402
4096 112074

PATCHED, run 2:
1 34087
2 73567
4 135624
8 211901
16 242819
32 310534
64 352663
128 381780
256 342483
512 301968
768 272596
1024 251014
2048 184939
4096 108186

> I've attached my quick-and-dirty patch. Obviously it'd need a few defines etc,
> but I wanted to get this out to discuss before spending further time.

Just for the record, here are some review comments posted in the other
thread - https://www.postgresql.org/message-id/CALj2ACXktNbG%3DK8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag%40mail.gmail.com..

BTW, I've seen a sporadic crash (SEGV) with the patch in bg writer
with the same set up [1], I'm not sure if it's really because of the
patch. I'm unable to reproduce it now and unfortunately I didn't
capture further details when it occurred.

[1] ./configure --prefix=$PWD/inst/ --enable-tap-tests CFLAGS="-O3" >
install.log && make -j 8 install > install.log 2>&1 &
shared_buffers = 8GB
max_wal_size = 32GB
max_connections = 4096
checkpoint_timeout = 10min

ubuntu: cat << EOF >> txid.sql
SELECT txid_current();
EOF
ubuntu: for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do
echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -f txid.sql
-c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-10-31 10:53:54 Re: Code checks for App Devs, using new options for transaction behavior
Previous Message vignesh C 2022-10-31 10:47:46 Re: Support logical replication of DDLs