Re: Move PinBuffer and UnpinBuffer to atomics

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, YUriy Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Move PinBuffer and UnpinBuffer to atomics
Date: 2016-03-19 20:10:29
Message-ID: CAPpHfduwFXv_nrCcCP1PX-j=kEmJXc_GTHKULmQT3xCneoNyFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 19, 2016 at 3:22 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

>
> On Mon, Mar 14, 2016 at 3:09 AM, Alexander Korotkov <
> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
>> I've drawn graphs for these measurements. The variation doesn't look
>> random here. TPS is going higher from measurement to measurement. I bet
>> you did measurements sequentially.
>> I think we should do more measurements until TPS stop growing and beyond
>> to see accumulate average statistics. Could you please do the same tests
>> but 50 runs.
>>
>
> I have taken reading at different client count (1,2 and 4)
>
> 1. Avg: Is taken average after discarding lowest 5 and highest 5 readings.
> 2. With 4 client I have only 30 reading.
>
> Summary:
>
> I think if we check avg or median atleast at 1 or 2 client head always
> looks winner, but same is not the case with 4 clients. And with 4 clients I
> can see much more fluctuation in the reading.
>
>
> Head(1 Client) Patch (1 Client)
> Head (2 Client) Patch (2 Client)
> Head (4 Client) Patch (4 Client)
> Avg: 19628 19578
> 37180 36536
> 70044 70731
> Median: 19663 19581
> 37967 37484
> 73003 75376
> Below is all the reading. (Arranged in sorted order)
>
> *Runs* *Head (1 Client)* *Patch (1 Client)*
> *Head (2 Client)* *Patch (2 Client)*
> *Head (4 Client)* *Patch (4 Client)*
> 1 18191 18153
> 29454 26128
> 49931 47210
> 2 18365 18768
> 31218 26956
> 53358 47287
> 3 19053 18769
> 31808 29286
> 53575 55458
> 4 19128 18915
> 31860 30558
> 54282 55794
> 5 19160 18948
> 32411 30945
> 56629 56253
> 6 19177 19055
> 32453 31316
> 57595 58147
> 7 19351 19232
> 32571 31703
> 59490 58543
> 8 19353 19239
> 33294 32029
> 63887 58990
> 9 19361 19255
> 33936 32217
> 64718 60233
> 10 19390 19297
> 34361 32767
> 65737 68210
> 11 19452 19368
> 34563 34487
> 65881 71409
> 12 19478 19387
> 36110 34907
> 67151 72108
> 13 19488 19388
> 36221 34936
> 70974 73977
> 14 19490 19395
> 36461 35068
> 72212 74891
> 15 19512 19406
> 36712 35298
> 73003 75376
> 16 19538 19450
> 37104 35492
> 74842 75468
> 17 19547 19487
> 37246 36648
> 75400 75515
> 18 19592 19521
> 37567 37263
> 75573 75626
> 19 19627 19536
> 37707 37430
> 75798 75745
> 20 19661 19556
> 37958 37461
> 75834 75868
> 21 19666 19607
> 37976 37507
> 76240 76161
> 22 19701 19624
> 38060 37557
> 76426 76162
> 23 19708 19643
> 38244 37717
> 76770 76333
> 24 19748 19684
> 38272 38285
> 77011 77009
> 25 19751 19694
> 38467 38294
> 77114 77168
> 26 19776 19695
> 38524 38469
> 77630 77318
> 27 19781 19709
> 38625 38642
> 77865 77550
> 28 19786 19765
> 38756 38643
> 77912 77904
> 29 19796 19823
> 38939 38649
> 78242 78736
> 30 19826 19847
> 39139 38659
>
>
> 31 19837 19899
> 39208 38713
>
>
> 32 19849 19909
> 39211 38837
>
>
> 33 19854 19932
> 39230 38876
>
>
> 34 19867 19949
> 39249 39088
>
>
> 35 19891 19990
> 39259 39148
>
>
> 36 20038 20085
> 39286 39453
>
>
> 37 20083 20128
> 39435 39563
>
>
> 38 20143 20166
> 39448 39959
>
>
> 39 20191 20198
> 39475 40495
>
>
> 40 20437 20455
> 40375 40664
>
>
>
So, I think it's really some regression here on small number clients. I
have following hypothesis about that. In some cases we use more atomic
operations than before. For instace, in BufferAlloc we have following block
of code. Previous code deals with that without atomic operations relying
on spinlock. So, we have 2 extra atomic operations here, and similar
situation in other places.

pg_atomic_fetch_and_u32(&buf->state, ~(BM_VALID | BM_DIRTY |
BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT |
BUF_USAGECOUNT_MASK)); if (relpersistence == RELPERSISTENCE_PERMANENT)
pg_atomic_fetch_or_u32(&buf->state, BM_TAG_VALID | BM_PERMANENT |
BUF_USAGECOUNT_ONE); else pg_atomic_fetch_or_u32(&buf->state, BM_TAG_VALID
| BUF_USAGECOUNT_ONE);
Actually, we behave like old code and do such modifications without
increasing number of atomic operations. We can just calculate new value of
state (including unset of BM_LOCKED flag) and write it to the buffer. In
this case we don't require more atomic operations. However, it becomes not
safe to use atomic decrement in UnpinBuffer(). We can use there loop of
CAS which wait buffer header to be unlocked like PinBuffer() does. I'm not
sure if this affects multicore scalability, but I think it worth trying.

So, this idea is implemented in attached patch. Please, try it for both
regression on lower number of clients and scalability on large number of
clients.

Other changes in this patch:
1) PinBuffer() and UnpinBuffer() use exponential backoff in spindealy
like LockBufHdr() does.
2) Previous patch contained revert
of ac1d7945f866b1928c2554c0f80fd52d7f977772. This was not intentional.
No, it doesn't contains this revert.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
pinunpin-cas-4.patch application/octet-stream 84.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-03-19 20:46:25 Re: dealing with extension dependencies that aren't quite 'e'
Previous Message Fabien COELHO 2016-03-19 19:28:55 Re: incorrect docs for pgbench / skipped transactions