Re: Failing start-up archive recovery at Standby mode in PG9.2.4

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Kyotaro HORIGUCHI <kyota(dot)horiguchi(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Date: 2013-05-07 13:40:41
Message-ID: 51890459.9030905@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26.04.2013 11:51, KONDO Mitsumasa wrote:
> Hi,
>
> I discavered the problem cause. I think taht horiguchi's discovery is
> another problem...
> Problem has CreateRestartPoint. In recovery mode, PG should not WAL record.
> Because PG does not know latest WAL file location.
> But in this problem case, PG(standby) write WAL file at RestartPoint in
> archive recovery.
> In recovery mode, I think that RestartPoint can write only
> MinRecoveryPoint.
>
> Here is Standby's pg_xlog directory in problem caused.
>> [mitsu-ko(at)localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/
>> 000000020000000000000003 000000020000000000000007
>> 00000002000000000000000B 00000003.history
>> 000000020000000000000004 000000020000000000000008
>> 00000002000000000000000C 00000003000000000000000E
>> 000000020000000000000005 000000020000000000000009
>> 00000002000000000000000D 00000003000000000000000F
>> 000000020000000000000006 00000002000000000000000A
>> 00000002000000000000000E archive_status
>
> This problem case is here.
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: attempting to remove
>> WAL segments older than log file 000000030000000000000002
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
>> xlog.c:3568
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction
>> log file "000000010000000000000002"
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
>> xlog.c:3607
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction
>> log file "000000020000000000000002"
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
>> xlog.c:3607
>> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint complete:
>> wrote 9 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 2
>> recycled; write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3,
>> longest=1.176 s, average=0.392 s
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointEnd, xlog.c:7893
>> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: recovery restart point
>> at 0/90FE448
>> [Standby] 2013-04-26 04:26:44 EDT DETAIL: last completed transaction
>> was at log time 2013-04-26 04:25:53.203725-04
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: CreateRestartPoint,
>> xlog.c:8601
>> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint starting: xlog
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointStart,
>> xlog.c:7821
>> cp: cannot stat `../arc/00000003000000000000000F': そのようなファイル
>> やディレクトリはありません
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file
>> "00000003000000000000000F" from archive: return code 256
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RestoreArchivedFile,
>> xlog.c:3323
>> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: unexpected pageaddr
>> 0/2000000 in log file 0, segment 15, offset 0
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: ValidXLOGHeader, xlog.c:4395
>> cp: cannot stat `../arc/00000003000000000000000F': そのようなファイル
>> やディレクトリはありません
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file
>> "00000003000000000000000F" from archive: return code 256
>
> In recovery, pg normary search WAL file at archive recovery.
> If propery WAL file is nothing(archive command is failed), next search
> pg_xlog directory.
> Normary, propety WAL file is nothing in pg_xlog.
> But this case has propety name's WAL file(But it's mistaken and illegal)
> in pg_xlog.
> So recovery is failed and broken Standby.
>
> So I fix CreateRestartPoint at branching point of executing
> MinRecoveryPoint.
> It seems to fix this problem, but attached patch is plain.

I didn't understand this. I committed a fix for the issue where recycled
WAL files get in the way of recovery, but I'm not sure if you're
describing the same issue or something else. But before we dig any
deeper into this, can you verify if you're still seeing any issues on
9.3beta1?

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-05-07 14:00:15 Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog
Previous Message Stephen Frost 2013-05-07 13:27:30 Re: pg_dump --snapshot