Skip site navigation (1) Skip section navigation (2)

Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Date: 2010-03-23 07:17:53
Message-ID: 3f0b79eb1003230017v16f4ecbeyc20e75beeffe8f1c@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-committerspgsql-docspgsql-hackers
Sorry for the delay.

On Fri, Mar 19, 2010 at 8:37 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Here's a patch I've been playing with.

Thanks! I'm reading the patch.

> The idea is that in standby mode,
> the server keeps trying to make progress in the recovery by:
>
> a) restoring files from archive
> b) replaying files from pg_xlog
> c) streaming from master
>
> When recovery reaches an invalid WAL record, typically caused by a
> half-written WAL file, it closes the file and moves to the next source.
> If an error is found in a file restored from archive or in a portion
> just streamed from master, however, a PANIC is thrown, because it's not
> expected to have errors in the archive or in the master.

But in the current (v8.4 or before) behavior, recovery ends normally
when an invalid record is found in an archived WAL file. Otherwise,
the server would never be able to start normal processing when there
is a corrupted archived file for some reasons. So, that invalid record
should not be treated as a PANIC if the server is not in standby mode
or the trigger file has been created. Thought?

When I tested the patch, the following PANIC error was thrown in the
normal archive recovery. This seems to derive from the above change.
The detail error sequence:
1. In ReadRecord(), emode was set to PANIC after 00000001000000000000000B
   was read.
2. 00000001000000000000000C including the contrecord tried to be read
   by using the emode (= PANIC). But since 00000001000000000000000C did
   not exist, PANIC error was thrown.

-----------------
LOG:  restored log file "00000001000000000000000B" from archive
cp: cannot stat `../data.arh/00000001000000000000000C': No such file
or directory
PANIC:  could not open file "pg_xlog/00000001000000000000000C" (log
file 0, segment 12): No such file or directory
LOG:  startup process (PID 17204) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
-----------------

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

pgsql-docs by date

Next:From: Heikki LinnakangasDate: 2010-03-24 12:31:06
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Previous:From: Heikki LinnakangasDate: 2010-03-19 13:44:53
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

pgsql-hackers by date

Next:From: Hitoshi HaradaDate: 2010-03-23 07:19:36
Subject: Re: Windowing Qual Pushdown
Previous:From: Heikki LinnakangasDate: 2010-03-23 06:07:54
Subject: Re: WIP: preloading of ispell dictionary

pgsql-committers by date

Next:From: Peter EisentrautDate: 2010-03-23 22:12:06
Subject: pgsql: Remove useless double assignment GCC 4.5 complained about it.
Previous:From: User ItagakiDate: 2010-03-23 06:50:58
Subject: pgstatsinfo - pg_statsinfo: Cleanup logging codes and add documentation.

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group