From: | Rayson Ho <raysonlogin(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | cache false-sharing in lwlocks |
Date: | 2010-01-11 17:24:43 |
Message-ID: | 73a01bf21001110924u9d754dby96bdd14f78e2840d@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
LWLockPadded is either 16 or 32 bytes, so modern systems (e.g. Core2
or AMD Opteron [1]) with cacheline size of 64 bytes can get
false-sharing in lwlocks.
I changed LWLOCK_PADDED_SIZE in src/backend/storage/lmgr/lwlock.c to
64, and ran sysbench OLTP read-only benchmark, and got a slight
improvement in throughput:
Hardware: single-socket Core 2, quad-core, Q6600 @ 2.40GHz
Software: Linux 2.6.28-17, glibc 2.9, gcc 4.3.3
PostgreSQL: 8.5alpha3
sysbench parameters: sysbench --num-threads=4 --max-requests=0
--max-time=120 --oltp-read-only=on --test=oltp
original: 3227, 3243, 3243
after: 3256, 3255, 3253
So there is a speedup of 1.005 or what other people usually call it, a
0.5% improvement.
However, it's a single socket machine, so all the cache traffic does
not need to go off-chip. Can someone with a multi-socket machine help
me run some test so that we can get a better idea of how this change
(patch attached) performs in bigger systems??
Thanks,
Rayson
P.S. And I just googled and found similar discussions about padding
LWLOCK_PADDED_SIZE, but the previous work was done on an IBM POWER
system, and the benchmark used was apachebench. IMO, the setup was too
complex to measure a small performance improvement in PostgreSQL.
[1] Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ ccNUMA
Multiprocessor Systems Application Note
Attachment | Content-Type | Size |
---|---|---|
lwlock.patch.txt | text/plain | 703 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2010-01-11 17:34:09 | Re: performance config help |
Previous Message | Ivan Voras | 2010-01-11 17:19:40 | Re: performance config help |