On Fri, 2004-10-22 at 17:44, Gaetano Mendola wrote:
> | Gaetano - skim-reading your script, how do you handle the situation when a
> | new xlog file has been written within 10 seconds? That way the current file
> | number will have jumped by 2, so when your script looks for the "Last wal"
> | using head -1 it will find the N+2 and the intermediate file will never be
> | copied. Looks like a problem to me...
> Yes, the only window failure I seen ( but I don't know if it's possible )
> ~ log N created
> log N filled
> archive log N
> log N+1 created
> log N+1 filled
> ~ log N+2 created
> ~ <---- the master die here before to archive the log N+1
> ~ archive log N+1
> in this case as you underline tha last log archived is the N and the N+2
> partial wal is added to archived wal collection and in the recovery fase
> the recovery stop after processing the log N.
> Is it possible that the postmaster create the N+2 file without finish to archive
> the N+1 ? ( I suspect yes :-( )
> The only cure I see here is to look for not archived WAL ( if possible ).
Hmm...well you aren't looking for archived wal, you're just looking at
wal...which is a different thing...
Situation I thought I saw was:
- copy away current partial filled xlog N
- xlog N fills, N+1 starts
- xlog N+1 fills, N+2 starts
- copy away current partial filled xlog: N+2 (+10 secs later)
i.e. if time to fill xlog (is ever) < time to copy away current xlog,
then you miss one.
So problem: you can miss one and never know you've missed one until the
recovery can't find it, which it never returns from...so it just hangs.
[Just so we're all clear: we're talking about Gaetano's script, not the
PostgreSQL archver. The postgresql archiver doesn't do it that way, so
it never misses one.]
Best Regards, Simon Riggs
In response to
pgsql-admin by date
|Next:||From: Devrim GUNDUZ||Date: 2004-10-22 18:36:06|
|Subject: Re: RPM vs. Compile benefits?|
|Previous:||From: Gaetano Mendola||Date: 2004-10-22 16:44:59|
|Subject: Re: replication using WAL archives|