Skip site navigation (1) Skip section navigation (2)

Re: replication using WAL archives

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Gaetano Mendola <mendola(at)bigfoot(dot)com>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>,pgsql-admin(at)postgresql(dot)org, iain(at)mst(dot)co(dot)jp
Subject: Re: replication using WAL archives
Date: 2004-10-22 17:29:10
Message-ID: 1098466150.20926.13.camel@localhost.localdomain (view raw or flat)
Thread:
Lists: pgsql-admin
On Fri, 2004-10-22 at 17:44, Gaetano Mendola wrote:
> | Gaetano - skim-reading your script, how do you handle the situation when a
> | new xlog file has been written within 10 seconds? That way the current file
> | number will have jumped by 2, so when your script looks for the "Last wal"
> | using head -1 it will find the N+2 and the intermediate file will never be
> | copied. Looks like a problem to me...
> 
> 
> Yes, the only window failure I seen ( but I don't know if it's possible )
> 
> Master:
> ~        log N created
> 	log N filled
> 	archive log N
> 	log N+1 created
> 	log N+1 filled
> ~        log N+2 created
> ~                   <---- the master die here before to archive the log N+1
> ~        archive log N+1
> 
> 
> in this case as you underline tha last log archived is the N and the N+2
> partial wal is added to archived wal collection and in the recovery fase
> the recovery stop after processing the log N.
> 
> Is it possible that the postmaster create the N+2 file without finish to archive
> the N+1 ? ( I suspect yes :-(  )
> 
> The only cure I see here is to look for not archived WAL ( if possible ).
> 

Hmm...well you aren't looking for archived wal, you're just looking at
wal...which is a different thing...

Situation I thought I saw was:

- copy away current partial filled xlog N
- xlog N fills, N+1 starts
- xlog N+1 fills, N+2 starts
- copy away current partial filled xlog: N+2 (+10 secs later)

i.e. if time to fill xlog (is ever) < time to copy away current xlog,
then you miss one.

So problem: you can miss one and never know you've missed one until the
recovery can't find it, which it never returns from...so it just hangs.

[Just so we're all clear: we're talking about Gaetano's script, not the
PostgreSQL archver. The postgresql archiver doesn't do it that way, so
it never misses one.]

-- 
Best Regards, Simon Riggs


In response to

Responses

pgsql-admin by date

Next:From: Devrim GUNDUZDate: 2004-10-22 18:36:06
Subject: Re: RPM vs. Compile benefits?
Previous:From: Gaetano MendolaDate: 2004-10-22 16:44:59
Subject: Re: replication using WAL archives

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group