Re: WARNINGs after starting backup server created with PITR

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Erik Jones <erik(at)myemma(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, General postgres mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: WARNINGs after starting backup server created with PITR
Date: 2008-01-19 19:24:08
Message-ID: 1200770648.4255.464.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, 2008-01-19 at 11:28 -0600, Erik Jones wrote:

> All of the warnings are below. For tables that had multiple warnings
> they seem to be for consecutive pages. All of these tables were
> seeing some pretty decent write traffic during the base backup which
> took place Tuesday night. The good news, for us, is that none of the
> queue tables matter -- that was transient data. For the other two
> history tables, since the PITR recovery ended at 4:58 am Wednesday
> morning when it ran up against the dropped WAL file they've already
> seen a good amount of traffic.

> However, I've been able to verify
> missing data by checking the newest timestamp field is older than
> those in other tables that had no errors, finding differences of as
> much as 11 hours.

So we definitely have missing data. I think the multi-phase rsync is
definitely suspect and should be avoided until we get to the bottom of
this.

Can you examine the server logs for that 11 hour period and see if there
are any other messages that might be relevant on both primary and
standby servers?

Can you save the WAL files covering that period also. We may want to
inspect them later to confirm whether the data was actually in them or
not. Can you save the very first WAL files after the recovery started?
We can use those to examine the first data blocks touched, which would
help confirm when the gun was fired.

Has rsync produced any messages of note? Might your script be ignoring
errors? What versions of rsync are you using?

Is the History table Insert only? We might use that fact to examine the
LSNs of the equivalent blocks on the Primary. If the LSNs are prior to
the start of the recovery, as noted in the backup label file of the
original base backup, then we can confirm rsync or postgres as the
cause. Do you still have the base backup label file or base backup?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Erik Jones 2008-01-19 19:24:52 Re: example query for postgresql
Previous Message Raymond O'Donnell 2008-01-19 19:11:39 Re: example query for postgresql