Re: Move unused buffers to freelist

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Move unused buffers to freelist
Date: 2013-05-23 15:14:39
Message-ID: CA+TgmobawHrzH0tMCgjC=_zmcJxYUo7ttawm4P2jf_tNi6dhLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 21, 2013 at 3:06 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>> Here are the results. The first field in each line is the number of
>> clients. The second number is the scale factor. The numbers after
>> "master" and "patched" are the median of three runs.
>>
>> 01 100 master 1433.297699 patched 1420.306088
>> 01 300 master 1371.286876 patched 1368.910732
>> 01 1000 master 1056.891901 patched 1067.341658
>> 01 3000 master 637.312651 patched 685.205011
>> 08 100 master 10575.017704 patched 11456.043638
>> 08 300 master 9262.601107 patched 9120.925071
>> 08 1000 master 1721.807658 patched 1800.733257
>> 08 3000 master 819.694049 patched 854.333830
>> 32 100 master 26981.677368 patched 27024.507600
>> 32 300 master 14554.870871 patched 14778.285400
>> 32 1000 master 1941.733251 patched 1990.248137
>> 32 3000 master 846.654654 patched 892.554222
>
> Is the above test for tpc-b?
> In the above tests, there is performance increase from 1~8% and decrease
> from 0.2~1.5%

It's just the default pgbench workload.

>> And here's the same results for 5-minute, read-only tests:
>>
>> 01 100 master 9361.073952 patched 9049.553997
>> 01 300 master 8640.235680 patched 8646.590739
>> 01 1000 master 8339.364026 patched 8342.799468
>> 01 3000 master 7968.428287 patched 7882.121547
>> 08 100 master 71311.491773 patched 71812.899492
>> 08 300 master 69238.839225 patched 70063.632081
>> 08 1000 master 34794.778567 patched 65998.468775
>> 08 3000 master 60834.509571 patched 61165.998080
>> 32 100 master 203168.264456 patched 205258.283852
>> 32 300 master 199137.276025 patched 200391.633074
>> 32 1000 master 177996.853496 patched 176365.732087
>> 32 3000 master 149891.147442 patched 148683.269107
>>
>> Something appears to have screwed up my results for 8 clients @ scale
>> factor 300 on master,
>
> Do you want to say the reading of 1000 scale factor?

Yes.

>>but overall, on both the read-only and
>> read-write tests, I'm not seeing anything that resembles the big gains
>> you reported.
>
> I have not generated numbers for read-write tests, I will check that once.
> For read-only tests, the performance increase is minor and different from
> what I saw.
> Few points which I could think of for difference in data:
>
> 1. In my test's I always observed best data when number of clients/threads
> are equal to number of cores which in your case should be at 16.

Sure, but you also showed substantial performance increases across a
variety of connection counts, whereas I'm seeing basically no change
at any connection count.

> 2. I think for scale factor 100 and 300, there should not be much
> performance increase, as for them they should mostly get buffer from
> freelist inspite of even bgwriter adds to freelist or not.

I agree.

> 3. In my tests variance is for shared buffers, database size is always less
> than RAM (Scale Factor -1200, approx db size 16~17GB, RAM -24 GB), but due
> to variance in shared buffers, it can lead to I/O.

Not sure I understand this.

> 4. Each run is of 20 minutes, not sure if this has any difference.

I've found that 5-minute tests are normally adequate to identify
performance changes on the pgbench SELECT-only workload.

>> Tests were run on a 16-core, 64-hwthread PPC64 machine provided to the
>> PostgreSQL community courtesy of IBM. Fedora 16, Linux kernel 3.2.6.
>
> To think about the difference in your and my runs, could you please tell me
> about below points
> 1. What is RAM in machine.

64GB

> 2. Are number of threads equal to number of clients.

Yes.

> 3. Before starting tests I have always done pre-warming of buffers (used
> pg_prewarm written by you last year), is it same for above read-only tests.

No, I did not use pg_prewarm. But I don't think that should matter
very much. First, the data was all in the OS cache. Second, on the
small scale factors, everything should end up in cache pretty quickly
anyway. And on the large scale factors, well, you're going to be
churning shared_buffers anyway, so pg_prewarm is only going to affect
the very beginning of the test.

> 4. Can you please once again run only the test where you saw variation(8
> clients @ scale> factor 1000 on master), because I have also seen that
> performance difference is very good for certain
> configurations(Scale Factor, RAM, Shared Buffers)

I can do this if I get a chance, but I don't really see where that's
going to get us. It seems pretty clear to me that there's no benefit
on these tests from this patch. So either one of us is doing the
benchmarking incorrectly, or there's some difference in our test
environments that is significant, but none of the proposals you've
made so far seem to me to explain the difference.

> Apart from above, I had one more observation during my investigation to find
> why in some cases, there is a small dip:
> 1. Many times, it finds the buffer in free list is not usable, means it's
> refcount or usage count is not zero, due to which it had to spend more time
> under BufFreelistLock.
> I had not any further experiments related to this finding like if it
> really adds any overhead.
>
> Currently I am trying to find reasons for small dip of performance and see
> if I could do something to avoid it. Also I will run tests with various
> configurations.
>
> Any other suggestions?

Well, I think that the code in SyncOneBuffer is not really optimal.
In some cases you actually lock and unlock the buffer header an extra
time, which seems like a whole lotta extra overhead. In fact, I don't
think you should be modifying SyncOneBuffer() at all, because that
affects not only the background writer but also checkpoints.
Presumably it is not right to put every unused buffer on the free list
when we checkpoint.

Instead, I suggest modifying BgBufferSync, specifically this part right here:

else if (buffer_state & BUF_REUSABLE)
reusable_buffers++;

What I would suggest is that if the BUF_REUSABLE flag is set here, use
that as the trigger to do StrategyMoveBufferToFreeListEnd(). That's
much simpler than the logic that you have now, and I think it's also
more efficient and more correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-05-23 15:16:10 Re: Time limit for a process to hold Content lock in Buffer Cache
Previous Message Atri Sharma 2013-05-23 15:08:37 Re: Time limit for a process to hold Content lock in Buffer Cache