Re: Adding more memory = hugh cpu load

From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
Cc: alexandre - aldeia digital <adaldeia(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Adding more memory = hugh cpu load
Date: 2011-10-10 15:39:08
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/10/2011 10:14 AM, Leonardo Francalanci wrote:

> I don't understand: don't you want postgresql to issue the fsync
> calls when it "makes sense" (and configure them), rather than having
> the OS decide when it's best to flush to disk? That is: don't you
> want all the memory to be used for caching, unless postgresql says
> otherwise (calling fsync), instead of "as soon as 1% of memory is
> used"?

You'd think that, which is why this bites so many people. That's not
quite how it works in practice, though. OS cache is a lot more than
altered database and WAL files, which do get fsync'd frequently. Beyond
that, you need to worry about what happens *between* fsync calls.

On a highly loaded database, or even just a database experiencing heavy
write volume due to some kind of ETL process, your amount of dirty
memory may increase much more quickly than you expect. For example, say
your checkpoint_timeout setting is the default of five minutes. An ETL
process runs that loads 2GB of data into a table, and you're archiving
transaction logs. So you now have three possible write vectors, not
including temp tables and what not. And that's only for that connection;
this gets more complicated if you have other OLTP connections on the
same DB.

So your memory is now flooded with 2-6GB of data, and that's easy for
memory to handle, and it can do so quickly. With 48GB of RAM, that's
well within caching range, so the OS never writes anything until the
fsync call. Then the database makes the fsync call, and suddenly the OS
wants to flush 2-6GB of data straight to disk. Without that background
trickle, you now have a flood that only the highest-end disk controller
or a backing-store full of SSDs or PCIe NVRAM could ever hope to absorb.

That write flood massively degrades your read IOPS, and degrades future
writes until it's done flushing, so all of your disk IO is saturated,
further worsening the situation. Now you're getting closer and closer to
your dirty_ratio setting, at which point the OS will effectively stop
responding to anything, so it can finally finish flushing everything to
disk. This can take a couple minutes, but it's not uncommon for these IO
storms to last over half an hour depending on the quality of the
disks/controller in question. During this time, system load is climbing
precipitously, and clients are getting query timeouts.

Adding more memory can actually make your system performance worse if
you don't equally increase the capability of your RAID/SAN/whatever to
compensate for increased size of write chunks.

This is counter-intuitive, but completely borne out by tests. The kernel
developers agree, or we wouldn't have dirty_bytes, or
dirty_background_bytes, and they wouldn't have changed the defaults to
5% and 10% instead of 10% and 40%. It's just one of those things nobody
expected until machines with vast amounts of RAM started becoming common.

Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604


See for terms and conditions related to this email

In response to


Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-10-10 16:07:34 Re: Adding more memory = hugh cpu load
Previous Message Leonardo Francalanci 2011-10-10 15:14:35 Re: Adding more memory = hugh cpu load