backups blocking everything

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: backups blocking everything
Date: 2011-10-27 16:47:15
Message-ID: CAEV0TzCH5CUBgyALqR_FQLwKonEAK6suShn79fdbpv99UkiyXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I've got a large mixed-used database, with the data warehouse side of things
consisting of several tables at hundreds of millions of rows, plus a number
of tables with tens of millions. There is partitioning, but as the volume
of data has risen, individual partitions have gotten quite large. Hardware
is 2x4 core 2.0Ghz Xeon processors, 176GB of RAM, 4 drives in raid 10 for
WAL logs and 16 or 20 spindles for data, also in RAID 10. Total database
size is currently 399GB - via pg_database_size(). It's also worth noting
that we switched from 8.4 to 9.0.4 only about a month ago, and we were not
seeing this problem on 8.4.x. The database is growing, but not at some kind
of exponential rate. full backup, compressed, on the old hardware was 6.3GB
and took about 1:45:00 to be written. Recent backups are 8.3GB and taking 3
or 4 hours. We were not seeing al queries stall out during the backups on
8.4, so far as I am aware.

The time it takes for pg_dump to run has grown from 1 hour to 3 and even 4
hours over the last 6 months, with more than half of that increase occurring
since we upgrade to 9.0.x. In the last several weeks (possibly since the
upgrade to 9.0.4), we are seeing all connections getting used up (our main
apps use connection pools, but monitoring and some utilities are making
direct connections for each query, and some of them don't check for the
prior query to complete before sending another, which slowly eats up
available connections). Even the connection pool apps cease functioning
during the backup, however, as all of the connections wind up in parse
waiting state. I also see lots of sockets in close wait state for what
seems to be an indefinite period while the backup is running and all
connections are used up. I assume all of this is the result of pg_dump
starting a transaction or otherwise blocking other access. I can get
everything using a pool, that's not a huge problem to solve, but that won't
fix the fundamental problem of no queries being able to finish while the
backup is happening.

I know I'm not the only one running a database of this size. How do others
handle backups? At the moment, I don't have replication happening. I can
use the old hardware to replicate to. It doesn't have quite the i/o
capacity and nowhere near as much RAM, but I wouldn't be looking to use it
for querying unless I lost the primary, and it is definitely capable of
handling the insert load, at least when the inserts are being done directly.
I'm not sure if it is easier or harder for it to handle the same inserts
via streaming replication. My question is, what are the performance
repercussions of running a pg_dump backup off the replicated server. If it
experiences the same kind of lockup, will SR get so far behind that it can't
catch up? Is there some other preferred way to get a backup of a large db?

And finally, is the lockout I'm experiencing actually the result of a bug or
misuse of pg_dump in some way?

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2011-10-27 17:39:51 Re: backups blocking everything
Previous Message Claudio Freire 2011-10-27 14:54:57 Re: PostgreSQL 9.0.4 blocking in lseek?