Lots of info here that I didn't know about! Since I have one of the
latest Opensolaris builds, I guess the write throttle feature is
already in there. Sadly, the blog doesn't say what build has it
For writes, I do everything synchronized because we really need a
consistent database on disk. We can see that during large inserts, the
intend log is used a lot.
What I'm going to te testing is a smaller shared_buffers value, and a
large ARC cache, and exactly the other way around.
Another question: since we have huge tables with hundreds of millions
or rows, we partitioned the database (it actually is creating the
partitions dynamically now on inserts with very good performance :D ),
but the question is: is the size of the partions important for the
memory parameters in config file? How can we determine the optimal
size of the partition. obviously, when doing selects, you want those
preferably only needing a single partition for speed. At the moment,
that is for the majority of situations the case. But there might be
some other things to think about...
On Oct 30, 2008, at 7:27 PM, Scott Carey wrote:
> On Thu, Oct 30, 2008 at 9:55 AM, Joshua D. Drake
> <jd(at)commandprompt(dot)com> wrote:
> On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote:
> > Remember that PostgreSQL doesn't cache anything on its own
> > if you do
> > want to hit disk it has to be in file cache.
> > By my understanding, this is absolutely false. Postgres caches
> > from tables/indexes in shared_buffers. You can make this very
> large if
> > you wish.
> You can make it very large with a potentially serious performance hit.
> It is very expensive to manage large amounts of shared buffers. It can
> also nail your IO on checkpoint if you are not careful (even with
> checkpoint smoothing). You are correct that I did not explain what I
> meant very well because shared buffers are exactly that, shared
> You can slam your I/O by havnig too large of either OS file cache or
> shared_buffers, and you have to tune both.
> In the case of large shared_buffers you have to tune postgres and
> especially the background writer and checkpoints.
> In the case of a large OS cache, you have to tune parameters to
> limit the ammount of dirty pages there and force writes out smoothly.
> Both layers attempt to delay writes for their own, often similar
> reasons, and suffer when a large sync comes along with lots of dirty
> Recent ZFS changes have been made to limit this, (http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle
> in earlier ZFS versions, this is what usually killed databases --
> ZFS in some situations would delay writes too long (even if "long"
> is 5 seconds) and get in trouble. This still has to be tuned well,
> combined with good checkpoint tuning in Postgres as you mention. For
> Linux, there are similar issues that have to be tuned on many
> kernels, or up to 40% of RAM can fill with dirty pages not written
> to disk.
> Letting the OS do it doesn't get rid of the problem, both levels of
> cache share very similar issues with large sizes and dirty pages
> followed by a sync.
> The buffer cache in shared_buffers is a lot more efficient for large
> scanning queries -- A select count(*) test will be CPU bound if it
> comes from shared_buffers or the OS page cache, and in the former
> case I have seen it execute up to 50% faster than the latter, by
> avoiding calling out to the OS to get pages, purely as a result of
> less CPU used.
> However that isn't the exact same thing as a "cache" at least as I was
> trying to describe it. shared buffers are used to keep track of pages
> (as well as some other stuff) and their current status. That is not
> same as caching a relation.
> It is not possible to pin a relation to memory using PostgreSQL.
> PostgreSQL relies on the operating system for that type of caching.
> The OS can't pin a relation either, from its point of view its all
> just a bunch of disk data blocks, not relations -- so it is all
> roughly equivalent. The OS can do a bit better job at data prefetch
> on sequential scans or other predictable seek sequences (ARC stands
> for Adaptive Replacement Cache) than postgres currently does (no
> intelligent prefetch in postgres AFAIK).
> So I apologize if I made it sound like Postgres cached the actual
> relation, its just pages -- but it is basically the same thing as
> the OS cache, but kept in process closer to the code that needs it.
> Its a cache that prevents disk reads.
> My suggestion for the OP is to try it both ways, and see what is
> better for his workload / OS / Hardware combination.
> Joshua D. Drake
In response to
pgsql-performance by date
|Next:||From: Oliver Johnson||Date: 2008-10-30 21:41:40|
|Subject: CPU utilization vs. IO wait, shared buffers?|
|Previous:||From: Scott Carey||Date: 2008-10-30 18:27:09|
|Subject: Re: Configuring for maximum memory usage|