Re: Using streaming replication as log archiving

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using streaming replication as log archiving
Date: 2010-09-30 14:24:29
Message-ID: AANLkTin9T3KmkXaLNkU3kccyCZdWcXjquTwP3K1Wr0kJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 30, 2010 at 15:45, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
>>> If you could keep the development "friendly" to such features, I
>>> may get around to adding them to support our needs....
>>
>> Would it be enough to have kind of an "archive_command" switch
>> that says "whenever you've finished a complete wal segment, run
>> this command on it"?
>
> That would allow some nice options.  I've been thinking what would
> be the ideal use of this with our backup scheme, and the best I've
> thought up would be that each WAL segment file would be a single
> output stream, with the option of calling a executable (which could
> be a script) with the target file name and then piping the stream to
> it.  At 16MB or a forced xlog switch, it would close the stream and
> call the executable again with a new file name.  You could have a
> default executable for the default behavior, or just build in a
> default if no executable is specified.

The problem with that one (which I'm sure is solvable somehow) is how
to deal with restarts. Both restarts in the middle of a segment
(happens all the time if you don't have an archive_timeout set), and
really also restarts between segments. How would the tool know where
to begin streaming again? Right now, it looks at the files - but doing
it by your suggestion there are no files to look at. We'd need a
second script/command to call to figure out where to restart from in
that case, no?

> The reason I like this is that I could pipe the stream through
> pg_clearxlogtail and gzip pretty much "as is" to the locations on
> the database server currently used for rsync to the two targets, and
> the rsync commands would send the incremental changes once per
> minute to both targets.  I haven't thought of another solution which
> provides incremental transmission of the WAL to the local backup
> location, which would be a nice thing to have, since this is most
> crucial when the WAN is down and not only is WAL data not coming
> back to our central location, but our application framework based
> replication stream isn't making back, either.

It should be safe to just rsync the archive directory as it's being
written by pg_streamrecv. Doesn't that give you the property you're
looking for - local machine gets data streamed in live, remote machine
gets it rsynced every minute?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-09-30 14:27:51 Re: is sync rep stalled?
Previous Message Heikki Linnakangas 2010-09-30 14:23:49 Re: is sync rep stalled?