Continuous Archiving for Multiple Warm Standby Servers

From: "Thomas F(dot) O'Connell" <tf(at)o(dot)ptimized(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Continuous Archiving for Multiple Warm Standby Servers
Date: 2007-05-07 20:10:49
Message-ID: 0D21B537-CB03-43D4-93AA-58AF1D84FE44@o.ptimized.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm attempting to design a postgres system whereby an authoritative
primary server simultaneously feeds continuous archives to a number
of warm standby servers that live both on the local network and on
remote networks.

The sticking point in my current thinking about such a system is what
to do in the event that any of an array of possible nodes becomes
unreachable. I would expect a custom archive_command to have the
intelligence about network reachability and to report a nonzero
status if it was unable to submit an archive to any particular node.

The way I understand it, postgres would then resubmit the file that
caused the nonzero status, which, if connectivity has been restored,
is no problem for the node that caused the nonzero status in the
first place. But then the issue becomes what to do with the nodes
that were fine when the nonzero status.

From the docs <http://www.postgresql.org/docs/8.2/static/continuous-
archiving.html#BACKUP-ARCHIVING-WAL>:

"It is advisable to test your proposed archive command to ensure that
it indeed does not overwrite an existing file, and that it returns
nonzero status in this case. We have found that cp -i does this
correctly on some platforms but not others. If the chosen command
does not itself handle this case correctly, you should add a command
to test for pre-existence of the archive file."

What is the advised remedy for this scenario in general? And then
what is it if nonzero status is returned by archive_command because
the file already exists on nodes that stayed up after a scenario
where nonzero status is returned because one or more nodes became
unreachable?

A follow-on question is: Does it become the responsibility of
archive_command in a scenario like this to track which files have
been archived on which nodes? Is there any introspective way for a
standby server to know that a file has been archived by primary? If
not, is it safe to reply on using sequential numbering of WAL files
for implicit introspection? I don't see any functions that provide
introspection of this nature. I ask because it seems like network-to-
network failures are a common enough occurrence that some mechanism
for archive verification is a must-have. I'm just trying to determine
how much of that functionality I'll have to build myself...

--
Thomas F. O'Connell

optimizing modern web applications
: for search engines, for usability, and for performance :

http://o.ptimized.com/
615-260-0005

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Glen Eustace 2007-05-07 20:12:06 Re: Connections refused during backups
Previous Message Rich Shepard 2007-05-07 20:10:15 Re: Date Math