Re: Speed up Clog Access by increasing CLOG buffers

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-10-26 22:45:11
Message-ID: 4a52a34f-57fa-7bcf-d34c-c15db40f0361@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/25/2016 06:10 AM, Amit Kapila wrote:
> On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>> On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>>> On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
>>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>
>>>> In the results you've posted on 10/12, you've mentioned a regression with 32
>>>> clients, where you got 52k tps on master but only 48k tps with the patch (so
>>>> ~10% difference). I have no idea what scale was used for those tests,
>>>
>>> That test was with scale factor 300 on POWER 4 socket machine. I think
>>> I need to repeat this test with multiple reading to confirm it was
>>> regression or run to run variation. I will do that soon and post the
>>> results.
>>
>> As promised, I have rerun my test (3 times), and I did not see any regression.
>>
>
> Thanks Tomas and Dilip for doing detailed performance tests for this
> patch. I would like to summarise the performance testing results.
>
> 1. With update intensive workload, we are seeing gains from 23%~192%
> at client count >=64 with group_update patch [1].
> 2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing
> gains from 12% to ~70% at client count >=64 [2]. Tests are done on
> 8-socket intel m/c.
> 3. With pgbench workload (both simple-update and tpc-b at 300 scale
> factor), we are seeing gain 10% to > 50% at client count >=64 [3].
> Tests are done on 8-socket intel m/c.
> 4. To see why the patch only helps at higher client count, we have
> done wait event testing for various workloads [4], [5] and the results
> indicate that at lower clients, the waits are mostly due to
> transactionid or clientread. At client-counts where contention due to
> CLOGControlLock is significant, this patch helps a lot to reduce that
> contention. These tests are done on on 8-socket intel m/c and
> 4-socket power m/c
> 5. With pgbench workload (unlogged tables), we are seeing gains from
> 15% to > 300% at client count >=72 [6].
>

It's not entirely clear which of the above tests were done on unlogged
tables, and I don't see that in the referenced e-mails. That would be an
interesting thing to mention in the summary, I think.

> There are many more tests done for the proposed patches where gains
> are either or similar lines as above or are neutral. We do see
> regression in some cases.
>
> 1. When data doesn't fit in shared buffers, there is regression at
> some client counts [7], but on analysis it has been found that it is
> mainly due to the shift in contention from CLOGControlLock to
> WALWriteLock and or other locks.

The questions is why shifting the lock contention to WALWriteLock should
cause such significant performance drop, particularly when the test was
done on unlogged tables. Or, if that's the case, how it makes the
performance drop less problematic / acceptable.

FWIW I plan to run the same test with logged tables - if it shows
similar regression, I'll be much more worried, because that's a fairly
typical scenario (logged tables, data set > shared buffers), and we
surely can't just go and break that.

> 2. We do see in some cases that granular_locking and no_content_lock
> patches has shown significant increase in contention on
> CLOGControlLock. I have already shared my analysis for same upthread
> [8].

I do agree that some cases this significantly reduces contention on the
CLogControlLock. I do however think that currently the performance gains
are limited almost exclusively to cases on unlogged tables, and some
logged+async cases.

On logged tables it usually looks like this (i.e. modest increase for
high client counts at the expense of significantly higher variability):

http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64

or like this (i.e. only partial recovery for the drop above 36 clients):

http://tvondra.bitbucket.org/#pgbench-3000-logged-async-skip-64

And of course, there are cases like this:

http://tvondra.bitbucket.org/#dilip-300-logged-async

I'd really like to understand why the patched results behave that
differently depending on client count.

>
> Attached is the latest group update clog patch.
>

How is that different from the previous versions?

>
> In last commit fest, the patch was returned with feedback to evaluate
> the cases where it can show win and I think above results indicates
> that the patch has significant benefit on various workloads. What I
> think is pending at this stage is the either one of the committer or
> the reviewers of this patch needs to provide feedback on my analysis
> [8] for the cases where patches are not showing win.
>
> Thoughts?
>

I do agree the patch(es) significantly reduce CLogControlLock, although
with WAL logging enabled (which is what matters for most production
deployments) it pretty much only shifts the contention to a different
lock (so the immediate performance benefit is 0).

Which raises the question why to commit this patch now, before we have a
patch addressing the WAL locks. I realize this is a chicken-egg problem,
but my worry is that the increased WALWriteLock contention will cause
regressions in current workloads.

BTW I've ran some tests with the number of clog buffers increases to
512, and it seems like a fairly positive. Compare for example these two
results:

http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip
http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip-clog-512

The first one is with the default 128 buffers, the other one is with 512
buffers. The impact on master is pretty obvious - for 72 clients the tps
jumps from 160k to 197k, and for higher client counts it gives us about
+50k tps (typically increase from ~80k to ~130k tps). And the tps
variability is significantly reduced.

For the other workload, the results are less convincing though:

http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
http://tvondra.bitbucket.org/#dilip-300-unlogged-sync-clog-512

Interesting that the master adopts the zig-zag patter, but shifted.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-10-27 00:06:58 Re: Unused variable in postgres_fdw/deparse.c
Previous Message Gilles Darold 2016-10-26 22:31:56 Re: Patch to implement pg_current_logfile() function