Re: replication using WAL archives

From: "Iain" <iain(at)mst(dot)co(dot)jp>
To: "Gaetano Mendola" <mendola(at)bigfoot(dot)com>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: replication using WAL archives
Date: 2004-10-22 01:52:09
Message-ID: 011a01c4b7d9$c121a390$7201a8c0@mst1x5r347kymb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thanks Gaetano, and Robert.

I'm not in the habit of reading the hackers list, so I still have no idea
where I first read about it :)

Anyway, I did some reading and it seems like there are good possibilities
here, but still quite a bit of work to do. I'll add this to my list of HA
options and keep an eye on it's progress.

regards
Iain
----- Original Message -----
From: "Gaetano Mendola" <mendola(at)bigfoot(dot)com>
To: "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>;
<pgsql-admin(at)postgresql(dot)org>
Cc: <iain(at)mst(dot)co(dot)jp>
Sent: Friday, October 22, 2004 6:49 AM
Subject: Re: [ADMIN] replication using WAL archives

> Robert Treat wrote:
>> On Thu, 2004-10-21 at 02:44, Iain wrote:
>>
>>>Hi,
>>>
>>>I thought I read something about this in relation to v8, but I can't
>>>find any reference to it now... is it (or will it be) possible to do
>>>master-slave style database replication by transmitting log files to the
>>>standby server and having it process them?
>>>
>>
>>
>> I'm not certain if this is 8.0, but some folks have created a working
>> version against the 8.0 code that will do something like this. Search
>> the pgsql-hacker mail list archives for more information on it.
>
> I sent a post on hackers, I put it here:
>
> =======================================================================
> Hi all,
> I seen that Eric Kerin did the work suggested by Tom about
> how to use the PITR in order to have an hot spare postgres,
> writing a C program.
>
> I did the same writing 2 shell scripts, one of them perform
> the restore the other one deliver the partial filled wal and
> check if the postmaster is alive ( check if the pid process
> still exist ).
>
> With these two scripts I'm able to have an hot spare installation,
> and the spare one go alive when the first postmaster dies.
>
> How test it:
>
> 1) Master node:
> modify postgresql.conf using:
>
> ~ archive_command = 'cp %p /mnt/server/archivedir/%f'
>
> ~ launch postgres and perform a backup as doc
>
> ~ http://developer.postgresql.org/docs/postgres/backup-online.html
>
> suggest to do
>
> launch the script:
>
> partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path>
>
> ~ this script will delivery each 10 seconds the "current" wal file,
> ~ and touch the "alive" file in order to notify the spare node that
> ~ the master node is up and running
>
>
> 2) Spare node:
> create a recovery.conf with the line:
>
> ~ restore_command = 'restore.sh /mnt/server/archivedir/%f %p
> /mnt/server/partialdir'
>
> ~ replace the content of data directory with the backup performed
> at point 1,
> ~ remove any file present in the pg_xlog directory ( leaving there
> the archive_status
> ~ directory ) and remove the postmaster.pid file ( this is
> necessary if you are running
> ~ the spare postgres on the same hw ).
>
> ~ launch the postmaster, the restore will continue till the "alive"
> file present in the
> ~ /mnt/server/partialdir directory is not updated for 60 seconds
> ( you can modify this
> ~ values inside the restore.sh script ).
>
> Be sure that restore.sh and all directories involved are accessible
>
> Let me know.
>
>
> This is a first step, of course, as Eric Kerin did, is better port these
> script
> in C and make it more robust.
>
> Postgres can help this process, as suggested by Tom creating a
> pg_current_wal()
> or even better having two new GUC parameters: archive_current_wal_command
> and
> archive_current_wal_delay.
>
> I problem I discover during the tests is that if you shut down the spare
> node
> and the restore_command is still waiting for a file then the postmaster
> will never
> exit :-(
> ==========================================================================
>
> I hope that is clear.
>
>
>
> Regards
> Gaetano Mendola
>
>
>

--------------------------------------------------------------------------------

> #!/bin/bash
>
>
> SOURCE=$1
> TARGET=$2
> PARTIAL=$3
>
> SIZE_EXPECTED=16777216 #bytes 16 MB
> DIED_TIME=60 #seconds
>
> function test_existence
> {
> if [ -f ${SOURCE} ]
> then
> COUNTER=0
>
> #I have to check if the file is begin copied
> #I assume that it will reach the right
> #size in a few seconds
>
> while [ $(stat -c '%s' ${SOURCE} ) -lt $SIZE_EXPECTED ]
> do
> sleep 1
> let COUNTER+=1
> if [ 20 -lt $COUNTER ]
> then
> exit 1 # BAILING OUT
> fi
> done
>
> cp $SOURCE $TARGET
> exit 0
> fi
> echo ${SOURCE}"> not found"
>
> #if is looking for a history file and not exist
> #I have suddenly exit
> echo $SOURCE | grep history > /dev/null 2>&1 && exit 1
> }
>
>
> while [ 1 ]
> do
>
> test_existence
>
> #CHECK IF THE MASTER IS ALIVE
> DELTA_TIME=$(( $( date +'%s' ) - $( stat -c '%Z' ${PARTIAL}/alive ) ))
> if [ $DIED_TIME -lt $DELTA_TIME ]
> then
> echo "Master is dead..."
> # Master is dead
> CURRENT_WAL=$( basename $SOURCE )
> echo "Partial: " ${PARTIAL}
> echo "Current wal: " ${CURRENT_WAL}
> echo "Target: " ${TARGET}
> cp ${PARTIAL}/${CURRENT_WAL}.partial ${TARGET} > /dev/null 2>&1 &&
> exit 0
> exit 1
> fi
>
> sleep 1
>
> done
>

--------------------------------------------------------------------------------

> #!/bin/bash
>
> PID=$1
> PARTIAL=$2
> PGXLOG=$3
>
> function copy_last_wal
> {
> FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 )
>
> echo "Last Wal> " $FILE
>
> cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp
> mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial
> find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {}
> }
>
>
> while [ 1 ]
> do
> ps --pid $PID > /dev/null 2>&1
> ALIVE=$?
>
> if [ "${ALIVE}" == "1" ]
> then
> #The process is dead
> echo "Process dead"
> copy_last_wal
> exit 1
> fi
>
> #The process still exist
> touch ${PARTIAL}/alive
> copy_last_wal
>
> sleep 10
> done
>

--------------------------------------------------------------------------------

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Chris White (cjwhite) 2004-10-22 05:41:46 Upgrading to 7.4.5
Previous Message Tom Lane 2004-10-21 23:55:01 Re: About System Catalogs