Re: PITR COPY Failure (was Point in Time Recovery)

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Kirkwood <markir(at)coretech(dot)co(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PITR COPY Failure (was Point in Time Recovery)
Date: 2004-07-20 11:57:16
Message-ID: 1090324635.28049.2554.camel@stromboli
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers pgsql-patches

On Tue, 2004-07-20 at 05:14, Tom Lane wrote:
> Mark Kirkwood <markir(at)coretech(dot)co(dot)nz> writes:
> > I have been doing some re-testing with CVS HEAD from about 1 hour ago
> > using the simplified example posted previously.
>
> > It is quite interesting:
>
> The problem seems to be that the computation of checkPoint.redo at
> xlog.c lines 4162-4169 (all line numbers are per CVS tip) is not
> allowing for the possibility that XLogInsert will decide it doesn't
> want to split the checkpoint record across XLOG files, and will then
> insert a WASTED_SPACE record to avoid that (see comment and following
> code at lines 758-795). This wouldn't really matter except that there
> is a safety crosscheck at line 4268 that tries to detect unexpected
> insertions of other records during a shutdown checkpoint.
>
> I think the code in CreateCheckPoint was correct when it was written,
> because we only recently changed XLogInsert to not split records
> across files. But it's got a boundary-case bug now, which your test
> scenario is able to exercise by making the recovery run try to write
> a shutdown checkpoint exactly at the end of a WAL file segment.
>
> The quick and dirty solution would be to dike out the safety check at
> 4268ff.

Well, taking out the safety check isn't the answer.

The check produces the last error message "concurrent transaction...",
but it isn't the cause of the mismatch in the first place.

If you take out that check, we still fail because the wasted space at
the end is causing a "record with zero length" error.

> I'm not real sure whether it's better to adjust
> the computation of checkPoint.redo or to smarten the safety check
> ... but one or the other needs to allow for file-end padding, or maybe
> we could hack some update of the state in WasteXLInsertBuffer(). (But
> at some point you have to say "this is more trouble than it's worth",
> so maybe we'll end up taking out the safety check.)

...I'm looking at other options now.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2004-07-20 12:51:54 Re: PITR COPY Failure (was Point in Time Recovery)
Previous Message Radha Krishnan 2004-07-20 11:08:55 Help!

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-07-20 12:18:48 Re: Patch for pg_dump: Multiple -t options and new -T option
Previous Message Andrew Dunstan 2004-07-20 10:34:33 Re: pg_config

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2004-07-20 12:18:48 Re: Patch for pg_dump: Multiple -t options and new -T option
Previous Message Jason Tishler 2004-07-20 11:33:25 FAQ_MSWIN patch