Re: WAL archiving to network drive

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Glen Parker <glenebob(at)nwlink(dot)com>, postgres general <pgsql-general(at)postgresql(dot)org>
Subject: Re: WAL archiving to network drive
Date: 2008-08-29 02:13:35
Message-ID: Pine.GSO.4.64.0808282159450.11207@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 20 Aug 2008, Tom Lane wrote:

> Greg Smith <gsmith(at)gregsmith(dot)com> writes:
>> You also don't want to be the guy who has to explain why the database is
>> taking hours to come back up again after it crashed and has 4000 WAL
>> segments to replay, because archiving failed for a long time and prevented
>> proper checkpoints (ask Robert Treat if you don't believe me, he also once
>> was that guy).
>
> Say what? Archiver failure can't/shouldn't prevent checkpointing.

Shouldn't, sure. The wacky case Robert ran into I was alluding to
involved the system not checkpointing anymore and just piling the archive
files up, and while I think it's safe to say that was all a hardware
problem stuff like that makes me nervous.

It is true that archiver failure prevents *normal* checkpointing, where
WAL files get recycled rather than piling up. I know that shouldn't make
any difference, but I've also been through two similarly awful situations
resulting from odd archiver problems that seemed mysterious at the time
(staring at the source later cleared up what really happened) that left me
even more paranoid than usual when working in this area.

The stance I've adopted says anything involving uncertain network
resources should get moved to outside of the code the database itself
runs. Any time you're following a different path than the usual one
through the server code (in this case exercising the archive failure and
resubmission section), I see that as an opportunity to run into more
obscure bugs. That's just not code that gets run/tested as often. It
also minimizes the amount of software the admin wrote that has to be right
(bugs in the archive_command script are really bad) in order for the
database to keep running.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Matthew Dennis 2008-08-29 02:22:57 Re: indexes on functions and create or replace function
Previous Message Christophe 2008-08-29 02:09:44 Re: indexes on functions and create or replace function