robust archiving of WAL segments

From: "Andrew W(dot) Gibbs" <awgibbs(at)awgibbs(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: robust archiving of WAL segments
Date: 2014-04-08 13:56:14
Message-ID: 20140408135614.GA5921@raptor.commandosoftware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I have a Postgres 8.4.X database that is subject to a fairly high
ingest rate. It is not uncommon for the system to generate 1-3 WAL
segments of 16M in size every minute. I endeavor to keep these WAL
segments around both for maintaining standby servers as well as for
disaster recovery.

I have over time ended up with a server architecture that looks like
this:

* master database has an archive_command that copies WAL segments
out of pg_xlog and puts them in another local directory that we'll
call wal_segment_staging_area

* there is a daemon that shovels files out of
wal_segment_staging_area using rsync and places them on a separate
file server in a directory we'll call wal_segment_archive

* there are standby servers in continuous recovery mode whose
restore_command sources WAL segments from the file server's
wal_segment_archive directory

The major considerations that governed this design are:

* the high rate of inflow generating a large number of segments quickly

* the desire not to have the WAL segments hosted on the master
database server long-term due to performance and fault-tolerance
reasons

* the fear of having an archive_command require network connectivity
to another server, the connectivity getting severed, and pg_xlog
getting filled and wedging the master server

* the problem that if you get substantially behind due to a
transient networking problem you want to be able to catch up as
quickly as possible, which means among other things that you don't
want to be opening up a separate remote connection to transfer
every segment in a loop, hence rsync

Over the past couple of years that this arrangement has been in
operation I have had on a handful of occasions occur what I believe to
be the following series of events:

* transfer operation of a WAL segment occurs, either from pg_xlog to
wal_segment_staging_area or from wal_segment_staging_area to
wal_segment_archive

* the process transferring the file believes that the operation has
succeeded and deletes its local copy of the file

* the server that received the file experiences a transient fault
that causes it to wedge and an administrator does a hard reset out
of desperation

* when the receiving server comes back up, the file that was created
during the archive operation is there, but it is zero-length,
presumably indicating that from the standpoint of an application
the write operation had succeeded, but from the standpoint of
caching the file's contents were not durably sync'd to disk

* when this has occurred to the file being placed in
wal_segment_staging_area, a subsequent power-on of the master
database has yielded repeated error messages in the log files
saying that the archive_command had failed (because, following the
Postgres docs' advice, the archive_command does a test to see if
the file already exists), and eventually Postgres says "I give up
trying to archive this WAL segment!" and deletes it from pg_xlog
without it being archived, which blew my mind the first time I saw
it happen because I thought the archive_command was supposed to be
retried indefinitely until success was reported

* when this has occurred to the file being placed in
wal_segment_archive, the effect is similar, although you don't get
any error messages in Postgres, just an archive that contains some
zero-length WAL segments

* in both cases, you end up with a WAL segment stream that is
corrupted, such that until you have made a new base backup if you
were to have to perform a point-in-time restore operation you
would only be able to restore up until the point of the event
because a subset of the WAL is lost forever

Thoughts and questions:

* it seems that robust archiving of WAL segments requires that your
script be as pedantic about transferring the files as Postgres is
about creating them; Postgres doesn't report a transaction as
having closed until the relevant WAL data has been durably written
to disk, and your archiving scripts cannot afford to delete the
source file until the destination file has likewise been durably
written

* is it a bug that Postgres gives up trying to archive a WAL segment
after several failed tries? or have I somehow misconfigured
something? maybe after the fashion of implementing a
restore_command for continuous recovery mode, i.e. implementing a
restore_command that does not return until a file has been
restored, I ought likewise implement an archive_command that does
not return until the file has been successfully archived?

* the Postgres docs chapter "Continuous Archiving and Point-in-Time
Recovery" provides a lot of good information about rigging up an
environment for dealing with all this stuff, but it does not to my
knowledge give any treatment to such matters as I describe herein,
and that may have yielded a somewhat precarious situation for a
lot of installations; there probably ought be a treatment of
ensuring that transmitted files have been written durably

* I'm not even sure what a good solution is for guaranteeing that
transmitted files have been durably persisted using common tools;
it doesn't seem that commonly available rsync implementations
support a "please call fsync" option, though some Googling yields
discussion of a patched version that someone created for such
purposes; maybe I could invoke the shell command "sync" as part of
the dance, but that doesn't seem that great either, since the
first transfer is happening on the master database and I don't
want to issue a sync request to all file systems as that will kill
database performance, and the second transfer is happening via
rsync and you wouldn't be able to call "sync" until "rsync" had
already deleted the source files, thus creating a race condition

* I can imagine what full solutions would look like, but not ones
that don't involve a fair amount of custom code to solve what
feels like it ought be a solved problem

How have you folks dealt with needing this level of robustness?

-- AWG

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Drazen Kacar 2014-04-08 14:32:12 Re: robust archiving of WAL segments
Previous Message Flavio Roche 2014-04-08 13:33:33 Re: Tipos de datos Funciones