Re: Lockless StrategyGetBuffer() clock sweep

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lockless StrategyGetBuffer() clock sweep
Date: 2014-10-31 09:51:17
Message-ID: CAA4eK1JUPn1rV0ep5DR74skcv+RRK7i2inM1X01ajG+gCX-hMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 30, 2014 at 5:01 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
>
> On 2014-10-30 10:23:56 +0530, Amit Kapila wrote:
> > I have a feeling that this might also have some regression at higher
> > loads (like scale_factor = 5000, shared_buffers = 8GB,
> > client_count = 128, 256) for the similar reasons as bgreclaimer patch,
> > means although both reduces contention around spin lock, however
> > it moves contention somewhere else. I have yet to take data before
> > concluding anything (I am just waiting for your other patch (wait free
> > LW_SHARED) to be committed).
>
> I have a hard time to see how this could be. In the uncontended case the
> number of cachelines touched and the number of atomic operations is
> exactly the same. In the contended case the new implementation does far
> fewer atomic ops - and doesn't do spinning.
>
> What's your theory?

I have observed that once we reduce the contention in one path, it doesn't
always lead to performance/scalability gain and rather shifts to other lock
if that exists. This is the reason why we have to work on reducing
contention
around both BufFreeList and Buf Mapping tables lock together. I have taken
some performance data and it seems this patch also exhibits similar
behaviour
as bgreclaimer patch and I believe resolving contention around dynahash can
improve the situation (Robert's chash patch can be helpful).

Performance Data
------------------------------
Configuration and Db Details
IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB
max_connections = 300
shared_buffers = 8GB
checkpoint_segments=30
checkpoint_timeout =15min
Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
Duration of each individual run = 5mins
Test mode - pgbench readonly (-M prepared)

Data below is median of 3 runs, for individual run data check document
attached with this mail.

Scale_Factor = 1000
Patch_ver/Client_Count 128 256 HEAD 265502 201689 Patch 283448 224888

Scale_Factor = 5000
Patch_ver/Client_Count 128 256 HEAD 190435 177477 Patch 171485 167794

The above data indicates that there is performance gain at scale factor
1000, however there is a regression at scale factor 5000.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
perf_lockless_strategy_getbuf.ods application/vnd.oasis.opendocument.spreadsheet 18.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-10-31 10:19:43 Tweaking Foreign Keys for larger tables
Previous Message furuyao 2014-10-31 08:46:49 Re: pg_receivexlog --status-interval add fsync feedback