Re: random_page_cost vs seq_page_cost

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: random_page_cost vs seq_page_cost
Date: 2012-02-13 02:04:28
Message-ID: 4F386FAC.1090003@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/11/2012 07:53 PM, Jeff Janes wrote:
> Has it ever been well-characterized what the problem is with>8GB?
> I've used shared buffers above that size for testing purposes and
> could never provoke a problem with it.

If anyone ever manages to characterize it well, we might actually make
progress on isolating and fixing it. All we have so far are a couple of
application level test results suggesting a higher value caused
performance to drop. The first public one I remember was from Jignesh;
http://archives.postgresql.org/pgsql-performance/2008-02/msg00184.php
gives him quoting on where he found the Solaris roll-off was at. What
we really need to stomp this one down is someone to find the same thing,
then show profiler output in each case. Note that Jignesh's report
included significant amount of filesystem level tuning, using things
like more direct I/O, and that might be a necessary requirement to run
into the exact variant of this limitation he mentioned.

I haven't spent a lot of time looking for this problem myself. What
I've heard second-hand from more than one person now is a) larger
settings than 8GB can be an improvement for some people still, and b)
simple benchmarks don't always have this problem. I have noted that the
few public and private reports I've gotten all suggest problems show up
on benchmarks of more complicated workloads. I think Jignesh mentioned
this being obvious in the more complicated TPC-derived benchmarks, not
in simple things like pgbench. I may be misquoting him though. And
given that one of the possible causes for this was an excess of some
lock contention, it's quite possible this one is already gone from 9.2,
given the large number of lock related issues that have been squashed so
far in this release.

All of those disclaimers are why I think no one has pushed to put a note
about this in the official docs. Right now the only suggested limit is
this one:

"The useful range for shared_buffers on Windows systems is generally
from 64MB to 512MB."

The most common practical limit I've run into with large shared_buffers
settings hits earlier than 8GB: running into checkpoint spike issues.
I have installs that started with shared_buffers in the 4 to 8GB range,
where we saw badly spiking I/O at checkpoint sync time. Lowering the
databases cache can result in smarter writing decisions withing the OS,
improving latency--even though total writes are actually higher if you
measure what flows from the database to OS. That side of the latency
vs. throughput trade-off existing is one of the main reasons I haven't
gone chasing after problems with really large shared_buffers settings.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2012-02-13 02:17:51 Re: [COMMITTERS] pgsql: Correctly initialise shared recoveryLastRecPtr in recovery.
Previous Message Jeff Janes 2012-02-12 23:04:59 Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)