From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
Subject: | Re: heavily contended lwlocks with long wait queues scale badly |
Date: | 2022-10-31 10:51:06 |
Message-ID: | CALj2ACWGHuuk0OmpEW8Kd93W1kYejia0PgfYj7wP700VhUqV8Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Oct 27, 2022 at 10:29 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> But I think we can solve that fairly reasonably nonetheless. We can change
> PGPROC->lwWaiting to not just be a boolean, but have three states:
> 0: not waiting
> 1: waiting in waitlist
> 2: waiting to be woken up
>
> which we then can use in LWLockDequeueSelf() to only remove ourselves from the
> list if we're on it. As removal from that list is protected by the wait list
> lock, there's no race to worry about.
>
> client patched HEAD
> 1 60109 60174
> 2 112694 116169
> 4 214287 208119
> 8 377459 373685
> 16 524132 515247
> 32 565772 554726
> 64 587716 497508
> 128 581297 415097
> 256 550296 334923
> 512 486207 243679
> 768 449673 192959
> 1024 410836 157734
> 2048 326224 82904
> 4096 250252 32007
>
> Not perfect with the patch, but not awful either.
Here are results from my testing [1]. Results look impressive with the
patch at a higher number of clients, for instance, on HEAD TPS with
1024 clients is 103587 whereas it is 248702 with the patch.
HEAD, run 1:
1 34534
2 72088
4 135249
8 213045
16 243507
32 304108
64 375148
128 390658
256 345503
512 284510
768 146417
1024 103587
2048 34702
4096 12450
HEAD, run 2:
1 34110
2 72403
4 134421
8 211263
16 241606
32 295198
64 353580
128 385147
256 341672
512 295001
768 142341
1024 97721
2048 30229
4096 13179
PATCHED, run 1:
1 34412
2 71733
4 139141
8 211526
16 241692
32 308198
64 406198
128 385643
256 338464
512 295559
768 272639
1024 248702
2048 191402
4096 112074
PATCHED, run 2:
1 34087
2 73567
4 135624
8 211901
16 242819
32 310534
64 352663
128 381780
256 342483
512 301968
768 272596
1024 251014
2048 184939
4096 108186
> I've attached my quick-and-dirty patch. Obviously it'd need a few defines etc,
> but I wanted to get this out to discuss before spending further time.
Just for the record, here are some review comments posted in the other
thread - https://www.postgresql.org/message-id/CALj2ACXktNbG%3DK8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag%40mail.gmail.com..
BTW, I've seen a sporadic crash (SEGV) with the patch in bg writer
with the same set up [1], I'm not sure if it's really because of the
patch. I'm unable to reproduce it now and unfortunately I didn't
capture further details when it occurred.
[1] ./configure --prefix=$PWD/inst/ --enable-tap-tests CFLAGS="-O3" >
install.log && make -j 8 install > install.log 2>&1 &
shared_buffers = 8GB
max_wal_size = 32GB
max_connections = 4096
checkpoint_timeout = 10min
ubuntu: cat << EOF >> txid.sql
SELECT txid_current();
EOF
ubuntu: for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do
echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -f txid.sql
-c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2022-10-31 10:53:54 | Re: Code checks for App Devs, using new options for transaction behavior |
Previous Message | vignesh C | 2022-10-31 10:47:46 | Re: Support logical replication of DDLs |