Quick Links

Re: replication using WAL archives

From:	Gaetano Mendola <mendola(at)bigfoot(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: replication using WAL archives
Date:	2004-10-22 20:50:34
Message-ID:	4179729A.5020401@bigfoot.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

Simon Riggs wrote:

> Situation I thought I saw was:
>
> - copy away current partial filled xlog N
> - xlog N fills, N+1 starts
> - xlog N+1 fills, N+2 starts
> - copy away current partial filled xlog: N+2 (+10 secs later)
>
> i.e. if time to fill xlog (is ever) < time to copy away current xlog,
> then you miss one.
>
> So problem: you can miss one and never know you've missed one until the
> recovery can't find it, which it never returns from...so it just hangs.

No. The restore.sh is not smart enough to know the last wal that must be
replayed, the only "smart thing" is to copy the supposed "current wal" in the
archive directory.

The script hang (and is a feature not a bug) if and only if the master is alive
( at least I'm not seeing any other hang ).

In your example in the archived directory will be present the files until logN
and logN+2 ( the current wal ) is in the partial directory, if the master die,
the restore.sh will copy logN+2 in the archived directory, the spare node will
execute restore.sh with file logN+1 as argument and if is not found then the
restore.sh will exit.

Regards
Gaetano Mendola

In response to

Re: replication using WAL archives at 2004-10-22 17:29:10 from Simon Riggs

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Thomas Swan	2004-10-23 02:50:43	Re: About System Catalogs
Previous Message	Bruno Wolff III	2004-10-22 18:41:49	Re: RPM vs. Compile benefits?