Re: Using streaming replication as log archiving

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using streaming replication as log archiving
Date: 2010-09-30 15:00:58
Message-ID: AANLkTimot6G7mOAvML=ss0u5++t=07_2E2va7aTdDs9F@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 30, 2010 at 16:39, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
> On Thu, Sep 30, 2010 at 10:24 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
>>> That would allow some nice options.  I've been thinking what would
>>> be the ideal use of this with our backup scheme, and the best I've
>>> thought up would be that each WAL segment file would be a single
>>> output stream, with the option of calling a executable (which could
>>> be a script) with the target file name and then piping the stream to
>>> it.  At 16MB or a forced xlog switch, it would close the stream and
>>> call the executable again with a new file name.  You could have a
>>> default executable for the default behavior, or just build in a
>>> default if no executable is specified.
>>
>> The problem with that one (which I'm sure is solvable somehow) is how
>> to deal with restarts. Both restarts in the middle of a segment
>> (happens all the time if you don't have an archive_timeout set), and
>> really also restarts between segments. How would the tool know where
>> to begin streaming again? Right now, it looks at the files - but doing
>> it by your suggestion there are no files to look at. We'd need a
>> second script/command to call to figure out where to restart from in
>> that case, no?
>
> And then think of the future, when sync rep is in... I'm hoping to be
> able to use something like this to do synchrous replication to my
> archive (instead of to a live server).

Right, that could be a future enhancement. Doesn't mean we shouldn't
still do our best with the async mode of course :P

>> It should be safe to just rsync the archive directory as it's being
>> written by pg_streamrecv. Doesn't that give you the property you're
>> looking for - local machine gets data streamed in live, remote machine
>> gets it rsynced every minute?
>
> When the "being written to" segmnt copmletes moves to the final
> location, he'll get an extra whole "copy" of the file.  But of the

Ah, good point.

> "move" can be an exec of his scritpt, the compressed/gzipped final
> result shouldn't be that bad.  Certainly no worse then what he's
> currently getting with archive command ;-)  And he's got the
> uncompressed incimental updates as they are happening.

Yeah, it would be trivial to replace the rename() call with a call to
a script that gets to do whatever is suitable to the file. Actually,
it'd probably be better to rename() it *and* call the script, so that
we can continue properly if the script fails.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-09-30 15:13:30 Re: Using streaming replication as log archiving
Previous Message Yeb Havinga 2010-09-30 14:54:03 Re: is sync rep stalled?