Skip site navigation (1) Skip section navigation (2)

Re: WAL archiving to network drive

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Glen Parker <glenebob(at)nwlink(dot)com>, postgres general <pgsql-general(at)postgresql(dot)org>
Subject: Re: WAL archiving to network drive
Date: 2008-08-29 02:13:35
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-general
On Wed, 20 Aug 2008, Tom Lane wrote:

> Greg Smith <gsmith(at)gregsmith(dot)com> writes:
>> You also don't want to be the guy who has to explain why the database is
>> taking hours to come back up again after it crashed and has 4000 WAL
>> segments to replay, because archiving failed for a long time and prevented
>> proper checkpoints (ask Robert Treat if you don't believe me, he also once
>> was that guy).
> Say what?  Archiver failure can't/shouldn't prevent checkpointing.

Shouldn't, sure.  The wacky case Robert ran into I was alluding to 
involved the system not checkpointing anymore and just piling the archive 
files up, and while I think it's safe to say that was all a hardware 
problem stuff like that makes me nervous.

It is true that archiver failure prevents *normal* checkpointing, where 
WAL files get recycled rather than piling up.  I know that shouldn't make 
any difference, but I've also been through two similarly awful situations 
resulting from odd archiver problems that seemed mysterious at the time 
(staring at the source later cleared up what really happened) that left me 
even more paranoid than usual when working in this area.

The stance I've adopted says anything involving uncertain network 
resources should get moved to outside of the code the database itself 
runs.  Any time you're following a different path than the usual one 
through the server code (in this case exercising the archive failure and 
resubmission section), I see that as an opportunity to run into more 
obscure bugs.  That's just not code that gets run/tested as often.  It 
also minimizes the amount of software the admin wrote that has to be right 
(bugs in the archive_command script are really bad) in order for the 
database to keep running.

* Greg Smith gsmith(at)gregsmith(dot)com Baltimore, MD

In response to

pgsql-general by date

Next:From: Matthew DennisDate: 2008-08-29 02:22:57
Subject: Re: indexes on functions and create or replace function
Previous:From: ChristopheDate: 2008-08-29 02:09:44
Subject: Re: indexes on functions and create or replace function

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group