Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections

From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections
Date: 2011-10-10 15:29:49
Message-ID: 4E930F6D.4010806@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/06/11 10:21 AM, Sean Laurent wrote:
> We've been running into a particularly strange problem that I'm trying
> to better understand. The super short version is that our application
> servers lose their connection to the database when I run a backup
> during periods of higher load and fail to reconnect.
>
> Here's an overview of the setup:
>
> - PostgreSQL 9.0.1 hosted on a cc1.4xlarge Amazon EC2 instance running
> CentOS 5.6
> - 8 disk RAID-0 array of EBS volumes used for primary data storage
> - 4 disk RAID-0 array of EBS volumes used for transaction logs
> - Root partition is ext3
> - RAID arrays are xfs
>
> Backups are taken using a script that runs the following workflow:
>
> - Tell Postgres to start a backup: SELECT pg_start_backup('RAID backup');
> - Run "xfs_freeze" on the primary RAID array
> - Tell Amazon to take snapshots of each of the EBS volumes
> - Run "xfs_freeze -u" to thaw the primary RAID array
> - Run "xfs_freeze" on the transaction log RAID array
> - Tell Amazon to take snapshots of each of the EBS volumes
> - Run "xfs_freeze -u" to thaw the transaction log RAID array
> - Tell Postgres the backup is finished: SELECT pg_stop_backup();
> - Remove old WAL files
>
> The whole process takes roughly 7 seconds on average. The RAID arrays
> are frozen for roughly 2 seconds on average.
>

While xfs_freeze is in effect, all writes are blocked. This is NOT what
you want to do here, postgres does NOT expect you to take an atomic
snapshot of the database files, rather, by bracketing your backup with
pg_start_backup and pg_stop_backup, it puts things in a state where a
file by file backup will be fine.

from the man pages...

xfs_freeze halts new access to the filesystem and creates a stable
image on disk. xfs_freeze is intended to be used with volume
managers and hardware RAID devices that support the creation of
snapshots.

The mount-point argument is the pathname of the directory where the
filesystem is mounted. The filesystem must be mounted to be frozen
(see mount <http://linux.die.net/man/8/mount>(8)).

The -f flag requests the specified XFS filesystem to be frozen from
new modifications. When this is selected, all ongoing transactions
in the filesystem are allowed to complete, new write system calls
are halted, other calls which modify the filesystem are halted, and
all dirty data, metadata, and log information are written to disk.
Any process attempting to write to the frozen filesystem will block
waiting for the filesystem to be unfrozen.

when postgres's writer processes block, I suspect things go sour fast.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2011-10-10 15:51:14 Re: [GENERAL] how to save a bytea value into a file?
Previous Message Tomas Vondra 2011-10-10 15:29:24 Re: Help on PostgreSQL