Re: replication using WAL archives

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Gaetano Mendola <mendola(at)bigfoot(dot)com>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-admin(at)postgresql(dot)org, iain(at)mst(dot)co(dot)jp
Subject: Re: replication using WAL archives
Date: 2004-10-22 17:29:10
Message-ID: 1098466150.20926.13.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Fri, 2004-10-22 at 17:44, Gaetano Mendola wrote:
> | Gaetano - skim-reading your script, how do you handle the situation when a
> | new xlog file has been written within 10 seconds? That way the current file
> | number will have jumped by 2, so when your script looks for the "Last wal"
> | using head -1 it will find the N+2 and the intermediate file will never be
> | copied. Looks like a problem to me...
>
>
> Yes, the only window failure I seen ( but I don't know if it's possible )
>
> Master:
> ~ log N created
> log N filled
> archive log N
> log N+1 created
> log N+1 filled
> ~ log N+2 created
> ~ <---- the master die here before to archive the log N+1
> ~ archive log N+1
>
>
> in this case as you underline tha last log archived is the N and the N+2
> partial wal is added to archived wal collection and in the recovery fase
> the recovery stop after processing the log N.
>
> Is it possible that the postmaster create the N+2 file without finish to archive
> the N+1 ? ( I suspect yes :-( )
>
> The only cure I see here is to look for not archived WAL ( if possible ).
>

Hmm...well you aren't looking for archived wal, you're just looking at
wal...which is a different thing...

Situation I thought I saw was:

- copy away current partial filled xlog N
- xlog N fills, N+1 starts
- xlog N+1 fills, N+2 starts
- copy away current partial filled xlog: N+2 (+10 secs later)

i.e. if time to fill xlog (is ever) < time to copy away current xlog,
then you miss one.

So problem: you can miss one and never know you've missed one until the
recovery can't find it, which it never returns from...so it just hangs.

[Just so we're all clear: we're talking about Gaetano's script, not the
PostgreSQL archver. The postgresql archiver doesn't do it that way, so
it never misses one.]

--
Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Devrim GUNDUZ 2004-10-22 18:36:06 Re: RPM vs. Compile benefits?
Previous Message Gaetano Mendola 2004-10-22 16:44:59 Re: replication using WAL archives