Re: Move PinBuffer and UnpinBuffer to atomics

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: YUriy Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Move PinBuffer and UnpinBuffer to atomics
Date: 2015-12-08 09:53:49
Message-ID: CAPpHfdsogj38HTDhNMLE56uJy9N8-=gYa2nNuWbPujGp2n1ffQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Andres!

I'd like to share testing results on IBM E880
​ server​
.
​ At first, I'd like to mentioned that we did experiments with atomic
increment on power8, despite it doesn't have native atomic increment. Let
me explain why. CAS operation in power8 assembly looks like this.

.L1: lwarx 9,0,5
​ ​
cmpw 0,9,3
​ ​
bne- 0,.L2
​ ​
stwcx. 4,0,5
​ ​
bne- 0,.L1
.L2: isync

​So, it's a loop until value isn't changed between lwarx and stwcx. When
you put CAS inside loop it becomes loop inside loop. Atomic increment is a
single loop like CAS.

.L1: lwarx 9,0,5
​ ​
add 9,9,3
​ ​
stwcx. 9,0,5
​ ​
bne- 0,.L1
​ ​
isync

​This is why atomic increment *could be* cheaper than loop over CAS and, it
worth having experiments. ​Another idea is that we can put arbitrary logic
between lwarx and stwcx. Thus, we can implement PinBuffer using single loop
of lwarx and stwcx which could be better than loop of CAS.

​Tested patches are following:
1) pinunpin-cas.patch – moves pin/unpin buffer to atomic operations with
buffer state. PinBuffer uses loop of CAS. Same as upthread, but rebased
with current master.
2) pinunpin-increment.patch (based on pinunpin-cas.patch) – PinBuffer
changes state using atomic increment operation instead of loop of CAS. Both
refcount and usagecount are incremented at once. There is not enough bits
in 32 bit state to guarantee that usagecount doesn't overflow. I moved
usagecount to higher bits of state, thus overflow of usagecount doesn't
affect other bits. Therefore, this patch doesn't pretend to be correct.
However, we can use it to check if we should try moving this direction.
3) lwlock-increment.patch – LWLockAttemptLock change state using atomic
increment operation instead of loop of CAS. This patch does it for
LWLockAttemptLock like pinunpin-increment.patch does for PinBuffer.
Actually, this patch is not directly related to buffer manager. However,
it's nice to test loop of CAS vs atomic increment in different places.
4) ​pinunpin-ibm-asm.patch (based on pinunpin-cas.patch) – assembly
optimizations of PinBuffer and LWLockAttemptLock which makes both of them
use single loop of lwarx and stwcx.

​The following versions were compared in our benchmark.
​1) 9.4
2) 9.5
3) master
4) pinunpin-cas – ​pinunpin-cas.patch
5) pinunpin-increment – pinunpin-cas.patch + pinunpin-increment.patch
​6​
) pinunpin-cas-lwlock-increment – pinunpin-cas.patch +
lwlock-increment.patch
7​
) pinunpin-cas-lwlock-increment – pinunpin-cas.patch +
pinunpin-increment.patch
​ + ​
lwlock-increment.patch
​8) ​pinunpin-lwlock-asm – pinunpin-cas.patch + ​pinunpin-ibm-asm.patch

​See results in ibm-scalability.png. We can see that there is almost no
effect of pinunpin-increment.patch. So, atomic increment in PinBuffer
doesn't give benefit on power8. I will publish test results on Intel
platform a bit later, but I can say there is no much effect too. So, I
think, we can throw this idea away.
However, effect of lwlock-increment.patch is huge. It gives almost same
effect as assembly optimizations. I think it worth having separate thread
for discussing lwlock optimization. It's likely we can do something much
better than current lwlock-increment.patch does.
For this thread, I think we can focus on pinunpin-cas.patch without
thinking about atomic increment.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
ibm-scalability.png image/png 97.1 KB
pinunpin-cas.patch application/octet-stream 63.0 KB
pinunpin-increment.patch application/octet-stream 6.0 KB
lwlock-increment.patch application/octet-stream 3.8 KB
pinunpin-ibm-asm.patch application/octet-stream 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-12-08 10:04:35 Re: Move PinBuffer and UnpinBuffer to atomics
Previous Message Jeremy Harris 2015-12-08 09:52:47 Re: [PATCH] Equivalence Class Filters