Re: Serious problem: media recovery fails after system or PostgreSQL crash

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Serious problem: media recovery fails after system or PostgreSQL crash
Date: 2012-12-16 16:38:09
Message-ID: 50CDF8F1.6040808@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8.12.2012 03:08, Jeff Janes wrote:
> On Thu, Dec 6, 2012 at 3:52 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> Hi,
>>
>> On 6.12.2012 23:45, MauMau wrote:
>>> From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>>>> Well, that's unfortunate, but it's not clear that automatic recovery is
>>>> possible. The only way out of it would be if an undamaged copy of the
>>>> segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
>>>> not even be trying to fetch from the archive if we had a local copy.
>>>
>>> No, PG will try to fetch the WAL file from pg_xlog when it cannot get it
>>> from archive. XLogFileReadAnyTLI() does that. Also, PG manual contains
>>> the following description:
>>>
>>> http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL
>>>
>>>
>>> WAL segments that cannot be found in the archive will be sought in
>>> pg_xlog/; this allows use of recent un-archived segments. However,
>>> segments that are available from the archive will be used in preference
>>> to files in pg_xlog/.
>>
>> So why don't you use an archive command that does not create such
>> incomplete files? I mean something like this:
>>
>> archive_command = 'cp %p /arch/%f.tmp && mv /arch/%f.tmp /arch/%f'
>>
>> Until the file is renamed, it's considered 'incomplete'.
>
> Wouldn't having the incomplete file be preferable over having none of it at all?
>
> It seems to me you need considerable expertise to figure out how to do
> optimal recovery (i.e. losing the least transactions) in this
> situation, and that that expertise cannot be automated. Do you trust
> a partial file from a good hard drive, or a complete file from a
> partially melted pg_xlog?

It clearly is a rather complex issue, no doubt about that. And yes,
reliability of the devices with pg_xlog on them is an important detail.
Alghough if the WAL is not written in a reliable way, you're hosed
anyway I guess.

The recommended archive command is based on the assumption that the
local pg_xlog is intact (e.g. because it's located on a reliable RAID1
array), which seems to be the assumption of the OP too.

In my opinion it's more likely to meet an incomplete copy of WAL in the
archive than a corrupted local WAL. And if it really is corrupted, it
would be identified during replay.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-12-16 16:40:53 Re: Set visibility map bit after HOT prune
Previous Message Simon Riggs 2012-12-16 16:25:03 Re: Set visibility map bit after HOT prune