Re: how is pitr replay interruption time determined?

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject: Re: how is pitr replay interruption time determined?
Date: 2007-08-28 22:44:29
Message-ID: 1188341069.4218.93.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, 2007-08-28 at 17:59 -0400, Tom Lane wrote:
> Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> writes:
> > Is there some way to force checkpoints on a db doing wal replay?
>
> No, it's hardwired to do it when it sees a checkpoint record in the WAL stream.
>
> > pg_control last modified: Mon Aug 27 12:12:55 2007
> > Time of latest checkpoint: Mon Jul 30 19:17:37 2007
>
> After looking again at the code, the "last modified" time is the time
> that a recovery checkpoint was last done, and the "latest checkpoint"
> is the timestamp of the WAL-stream checkpoint record that triggered it.
> In a situation where you're catching up on historical WAL they could be
> far apart, but when a slave is just following the master there shouldn't
> be a huge difference --- not more than the maximum time to fill a WAL
> record and ship it over to the slave, for sure.
>
> (BTW, I misread it before --- it looks like the "at log time" value
> printed at startup *is* taken from the checkpoint record that it's
> trying to roll forward from.)

That's correct. Sorry for not replying earlier; just back from hols.

Jumping back to original thought: Robert, you should be using the last
checkpoint location, not the last time to decide which xlogs to remove.

> Assuming that you're absorbing data from the master at a steady rate,
> the only reason I can see for the timestamps to be so old is if the
> "rm_safe_restartpoint" checks are always failing. I seem to remember
> that we found and fixed a bug that could cause something like that,
> but I can't find any trace of it in the CVS logs. Simon, do you
> recall such a problem post-8.2.0?

Yeh, we traced a problem with GIN indexes to this cause in early June;
Teodor fixed it quickly in REL8_2_STABLE, but that won't be available
until 8.2.5.

I'd be happier with a log message to say

ereport(DEBUG2,
(errmsg("RM %d not safe to record restart point at %X/%X",
rmid,
checkPoint->redo.xlogid,
checkPoint->redo.xrecoff)));

to help trace such things in future.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2007-08-28 22:49:56 Re: how is pitr replay interruption time determined?
Previous Message Tom Lane 2007-08-28 21:59:20 Re: how is pitr replay interruption time determined?