Re: Scaling shared buffer eviction

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-19 11:21:43
Message-ID: CAA4eK1LFGcvzMdcD5NZx7B2gCbP1G7vWK7w32EZk=VOOLUds-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 16, 2014 at 10:21 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Sep 16, 2014 at 8:18 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>
>> In most cases performance with patch is slightly less as compare
>> to HEAD and the difference is generally less than 1% and in a case
>> or 2 close to 2%. I think the main reason for slight difference is that
>> when the size of shared buffers is almost same as data size, the number
>> of buffers it needs from clock sweep are very less, as an example in first
>> case (when size of shared buffers is 12286MB), it actually needs at most
>> 256 additional buffers (2MB) via clock sweep, where as bgreclaimer
>> will put 2000 (high water mark) additional buffers (0.5% of shared buffers
>> is greater than 2000 ) in free list, so bgreclaimer does some extra work
>> when it is not required and it also leads to condition you mentioned
>> down (freelist will contain buffers that have already been touched since
>> we added them). Now for case 2 (12166MB), we need buffers more
>> than 2000 additional buffers, but not too many, so it can also have
>> similar effect.
>>
>
> So there are two suboptimal things that can happen and they pull in
> opposite directions. I think you should instrument the server how often
> each is happening. #1 is that we can pop a buffer from the freelist and
> find that it's been touched. That means we wasted the effort of putting it
> on the freelist in the first place. #2 is that we can want to pop a buffer
> from the freelist and find it empty and thus be forced to run the clock
> sweep ourselves. If we're having problem #1, we could improve things by
> reducing the water marks. If we're having problem #2, we could improve
> things by increasing the water marks. If we're having both problems, then
> I dunno. But let's get some numbers on the frequency of these specific
> things, rather than just overall tps numbers.
>

Specific numbers of both the configurations for which I have
posted data in previous mail are as follows:

Scale Factor - 800
Shared_Buffers - 12286MB (Total db size is 12288MB)
Client and Thread Count = 64
buffers_touched_freelist - count of buffers that backends found touched
after
popping from freelist.
buffers_backend_clocksweep - count of buffer allocations not satisfied from
freelist

buffers_alloc 1531023 buffers_backend_clocksweep 0
buffers_touched_freelist 0

Scale Factor - 800
Shared_Buffers - 12166MB (Total db size is 12288MB)
Client and Thread Count = 64

buffers_alloc 1531010 buffers_backend_clocksweep 0
buffers_touched_freelist 0

In both the above cases, I have taken data multiple times to ensure
correctness. From the above data, it is evident that in both the above
configurations all the requests are satisfied from the initial freelist.
Basically the amount of shared buffers configured
(12286MB = 1572608 buffers and 12166MB = 1557248 buffers) are
sufficient to contain all the work load for pgbench run.

So now the question is why we are seeing small variation (<1%) in data
in case all the data fits in shared buffers and the reason could be that
we have added few extra instructions (due to increase in StrategyControl
structure size, additional function call, one or two new assignments) in the
Buffer Allocation path (the extra instructions will also be only till all
the data
pages gets associated with buffers, after that the control won't even reach
StrategyGetBuffer()) or it may be due to variation across different runs
with
different binaries.

I have went ahead to take the data in cases shared buffers are tiny bit
(0.1%
and .05%) less than workload (based on buffer allocations done in above
cases).

Performance Data
-------------------------------

Scale Factor - 800
Shared_Buffers - 11950MB

Client_Count/Patch_Ver 8 16 32 64 128 HEAD 68424 132540 195496 279511
283280 sbe_v9 68565 132709 194631 284351 289333

Scale Factor - 800
Shared_Buffers - 11955MB

Client_Count/Patch_Ver 8 16 32 64 128 HEAD 68331 127752 196385 274387
281753 sbe_v9 68922 131314 194452 284292 287221

The above data indicates that performance is better with patch
in almost all cases and especially at high concurrency (64 and
128 client count).

The overall conclusion is that with patch
a. when the data can fit in RAM and not completely in shared buffers,
the performance/scalability is quite good even if shared buffers are just
tiny bit less that all the data.
b. when shared buffers are sufficient to contain all the data, then there is
a slight difference (<1%) in performance.

>
>> d. Lets not do anything as if user does such a configuration, he should
>> be educated to configure shared buffers in a better way and or the
>> performance hit doesn't seem to be justified to do any further
>> work.
>>
>
> At least worth entertaining.
>
> Based on further analysis, I think this is the way to go.

Attached find the patch for new stat (buffers_touched_freelist) just in
case you want to run the patch with it and detailed (individual run)
performance data.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
scalable_buffer_eviction_v9_stats.patch application/octet-stream 5.6 KB
perf_read_scalability_data_v9.ods application/vnd.oasis.opendocument.spreadsheet 19.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2014-09-19 11:38:19 Re: [REVIEW] Re: Compression of full-page-writes
Previous Message Andres Freund 2014-09-19 10:19:03 Re: GCC memory barriers are missing "cc" clobbers