Shared buffers, db transactions commited, and write IO on Solaris

From: Erik Jones <erik(at)myemma(dot)com>
To: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Shared buffers, db transactions commited, and write IO on Solaris
Date: 2007-03-28 19:56:47
Message-ID: 9B7F4B34-7E96-4501-BCE0-D0BED739530A@myemma.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greetings,

We've recently made a couple changes to our system that have resulted
in a drastic increase in performance as well as some very confusing
changes to the database statistics, specifically
pg_stat_database.xact_commit. Here's the details:

OS: Solaris10 x86
Server: Sunfire X4100, 8GB Memory, 2 Dual Core Opterons
Postgres 8.2.3
Disk array:
Sun STK 6130 + CSM100 SATA tray, dual channel MPXIO, 15k drives,
RAID5 across 14 disks
WAL logs on SATA RAID10
SAN architecture, 2 brocade FABRIC switches

The changes we made were:

Increase shared buffers from 150000 to 200000
Set the disk mount for the data directory to use forcedirectio (added
that mount option that to the /etc/vfstab entry (ufs fs))

So, the reason we did this was that for months now we'd been
experiencing extremely high IO load from both the perspective of the
OS and the database, specifically where writes were concerned.
During peak hourse it wasn't unheard of for pg_stat_database to
report anywhere from 500000 to 1000000 transactions committed in an
hour. iostat's %b (disk busy) sat at 100% for longer than we'd care
to think about with the wait percentage going from a few percent on
up to 50% at times and the cpu load almost never rising from around a
2 avg., i.e. we were extremely IO bound in all cases.

As soon as we restarted postgres after making those changes the IO
load was gone. While we the number and amount of disk reads have
stayed pretty much the same and the number of disk writes have stayed
the same, the amount of data being written has dropped by about a
factor of 10, which is huge. The cpu load shot way up to around a 20
avg. and stayed that way up and stayed that way for about two days
(we're thinking that was autovacuum "catching up"). In addition, and
this is the truly confusing part, the xact_commit and xact_rollback
stats from pg_stat_database both dropped by an order of magnitude
(another factor of 10). So, we are now doing 50000 to 100000 commits
per hour during peak hours.

So, where were all of those extra transactions coming from? Are
transactions reported on in pg_stat_database anything but SQL
statements? What was causing all of the excess(?!) data being
written to the disk (it seems that there's a 1:1 correspondence
between the xacts and volume of data being written)? Given that we
have the bgwriter on, could it have been the culprit and one of the
changes allowed it to now operate more efficiently and/or correctly?

erik jones <erik(at)myemma(dot)com>
software developer
615-296-0838
emma(r)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dan Harris 2007-03-29 02:22:25 Planner doing seqscan before indexed join
Previous Message Daniel Cristian Cruz 2007-03-28 17:08:27 Re: Improving performance on system catalog