Re: Wait free LW_SHARED acquisition - v0.9

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Wait free LW_SHARED acquisition - v0.9
Date: 2014-10-10 07:57:56
Message-ID: 20141010075756.GL29124@awork2.int
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2014-10-10 10:13:03 +0530, Amit Kapila wrote:
> I have done few performance tests for above patches and results of
> same is as below:

Cool, thanks.

> Performance Data
> ------------------------------
> IBM POWER-7 16 cores, 64 hardware threads
> RAM = 64GB
> max_connections =210
> Database Locale =C
> checkpoint_segments=256
> checkpoint_timeout =35min
> shared_buffers=8GB
> Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
> Duration of each individual run = 5mins
> Test type - read only pgbench with -M prepared
> Other Related information about test
> a. This is the data for median of 3 runs, the detailed data of individual
> run
> is attached with mail.
> b. I have applied both the patches to take performance data.
>
> Scale Factor - 100
>
> Patch_ver/Client_count 1 8 16 32 64 128 HEAD 13344 106921 196629 295123
> 377846 333928 PATCH 13662 106179 203960 298955 452638 465671
>
> Scale Factor - 3000
>
> Patch_ver/Client_count 8 16 32 64 128 160 HEAD 86920 152417 231668
> 280827 257093 255122 PATCH 87552 160313 230677 276186 248609 244372
>
>
> Observations
> ----------------------
> a. The patch performs really well (increase upto ~40%) incase all the
> data fits in shared buffers (scale factor -100).
> b. Incase data doesn't fit in shared buffers, but fits in RAM
> (scale factor -3000), there is performance increase upto 16 client count,
> however after that it starts dipping (in above config unto ~4.4%).

Hm. Interesting. I don't see that dip on x86.

> The above data shows that the patch improves performance for cases
> when there is shared LWLock contention, however there is a slight
> performance dip in case of Exclusive LWLocks (at scale factor 3000,
> it needs exclusive LWLocks for buf mapping tables). Now I am not
> sure if this is the worst case dip or under similar configurations the
> performance dip can be higher, because the trend shows that dip is
> increasing with more client counts.
>
> Brief Analysis of code w.r.t performance dip
> ---------------------------------------------------------------------
> Extra Instructions w.r.t Head in Acquire Exclusive lock path
> a. Attempt lock twice
> b. atomic operations for nwaiters in LWLockQueueSelf() and
> LWLockAcquireCommon()
> c. Now we need to take spinlock twice, once for self queuing and then
> again for setting releaseOK.
> d. few function calls and some extra checks

Hm. I can't really see the number of atomics itself matter - a spinning
lock will do many more atomic ops than this. But I wonder whether we
could get rid of the releaseOK lock. Should be quite possible.

> Now probably these shouldn't matter much in case backend needs to
> wait for other Exclusive locker, but I am not sure what else could be
> the reason for dip in case we need to have Exclusive LWLocks.

Any chance to get a profile?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2014-10-10 08:27:40 alter user/role CURRENT_USER
Previous Message Peter Geoghegan 2014-10-10 07:33:08 Re: Obsolete reference to _bt_tuplecompare() within tuplesort.c