Re: Failing start-up archive recovery at Standby mode in PG9.2.4

From: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Kyotaro HORIGUCHI <kyota(dot)horiguchi(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Date: 2013-04-26 08:51:30
Message-ID: 517A4012.9030502@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I discavered the problem cause. I think taht horiguchi's discovery is another problem...
Problem has CreateRestartPoint. In recovery mode, PG should not WAL record.
Because PG does not know latest WAL file location.
But in this problem case, PG(standby) write WAL file at RestartPoint in archive recovery.
In recovery mode, I think that RestartPoint can write only MinRecoveryPoint.

Here is Standby's pg_xlog directory in problem caused.
> [mitsu-ko(at)localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/
> 000000020000000000000003 000000020000000000000007 00000002000000000000000B 00000003.history
> 000000020000000000000004 000000020000000000000008 00000002000000000000000C 00000003000000000000000E
> 000000020000000000000005 000000020000000000000009 00000002000000000000000D 00000003000000000000000F
> 000000020000000000000006 00000002000000000000000A 00000002000000000000000E archive_status

This problem case is here.
> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: attempting to remove WAL segments older than log file 000000030000000000000002
> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3568
> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction log file "000000010000000000000002"
> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3607
> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction log file "000000020000000000000002"
> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3607
> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint complete: wrote 9 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3, longest=1.176 s, average=0.392 s
> [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointEnd, xlog.c:7893
> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: recovery restart point at 0/90FE448
> [Standby] 2013-04-26 04:26:44 EDT DETAIL: last completed transaction was at log time 2013-04-26 04:25:53.203725-04
> [Standby] 2013-04-26 04:26:44 EDT LOCATION: CreateRestartPoint, xlog.c:8601
> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint starting: xlog
> [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointStart, xlog.c:7821
> cp: cannot stat `../arc/00000003000000000000000F': そのようなファイルやディレクトリはありません
> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file "00000003000000000000000F" from archive: return code 256
> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RestoreArchivedFile, xlog.c:3323
> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: unexpected pageaddr 0/2000000 in log file 0, segment 15, offset 0
> [Standby] 2013-04-26 04:26:44 EDT LOCATION: ValidXLOGHeader, xlog.c:4395
> cp: cannot stat `../arc/00000003000000000000000F': そのようなファイルやディレクトリはありません
> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file "00000003000000000000000F" from archive: return code 256

In recovery, pg normary search WAL file at archive recovery.
If propery WAL file is nothing(archive command is failed), next search pg_xlog directory.
Normary, propety WAL file is nothing in pg_xlog.
But this case has propety name's WAL file(But it's mistaken and illegal) in pg_xlog.
So recovery is failed and broken Standby.

So I fix CreateRestartPoint at branching point of executing MinRecoveryPoint.
It seems to fix this problem, but attached patch is plain.

Best Regard,
--
NTT Open Source Software Center
Mitsumasa KONDO

Attachment Content-Type Size
fix-ArchiveRecovery-in-StandbyMode.patch text/x-diff 683 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Flower 2013-04-26 09:00:26 Re: pg_controldata gobbledygook
Previous Message Ants Aasma 2013-04-26 08:28:16 Re: Substituting Checksum Algorithm (was: Enabling Checksums)