Serious problem: media recovery fails after system or PostgreSQL crash

From: "MauMau" <maumau307(at)gmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Serious problem: media recovery fails after system or PostgreSQL crash
Date: 2012-12-06 14:41:39
Message-ID: A70482CA20CD460CB1053F2177CD7789@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Although this may have to be posted to pgsql-bugs or pgsql-general, let me
ask you here because the problem probably needs PostgreSQL's code fix.

[Problem]
I'm using PostgreSQL 9.1.6 on Linux. I encountered a serious problem that
media recovery failed showing the following message:

FATAL: archive file "000000010000008000000028" has wrong size: 7340032
instead of 16777216

I'm using normal cp command to archive WAL files. That is:

archive_command = '/path/to/my_script.sh "%p" "/backup/archive_log/%f"'

<<my_script.sh>>
--------------------------------------------------
#!/bin/sh
some processing...
cp "$1" "$2"
other processing...
--------------------------------------------------

The media recovery was triggered by power failure. The disk drive that
stored $PGDATA failed after a power failure. So I replaced the failed disk,
and performed media recovery by creating recovery.conf and running pg_ctl
start. However, pg_ctl failed with the above error message.

[Cause]
The cause is clear from the message. PostgreSQL refuses to continue media
recovery when it finds an archived WAL file whose size is not 16 MB. The
relevant code is in src/backend/access/transam/xlog.c:

--------------------------------------------------
if (expectedSize > 0 && stat_buf.st_size != expectedSize)
{
int elevel;

/*
* If we find a partial file in standby mode, we assume it's
* because it's just being copied to the archive, and keep
* trying.
*
* Otherwise treat a wrong-sized file as FATAL to ensure the
* DBA would notice it, but is that too strong? We could try
* to plow ahead with a local copy of the file ... but the
* problem is that there probably isn't one, and we'd
* incorrectly conclude we've reached the end of WAL and we're
* done recovering ...
*/
if (StandbyMode && stat_buf.st_size < expectedSize)
elevel = DEBUG1;
else
elevel = FATAL;
ereport(elevel,
(errmsg("archive file \"%s\" has wrong size: %lu instead of %lu",
xlogfname,
(unsigned long) stat_buf.st_size,
(unsigned long) expectedSize)));
return false;
}
--------------------------------------------------

[How to fix]
Archived files can become smaller than their expected sizes for some
reasons:

1. The power fails while archive_command is copying files (as in my case).
2. Immediate shutdown (pg_ctl stop -mi) is performed while archive_command
is copying files. In this case, cp or equivalent copying command is
cancelled by SIGQUIT sent by postmaster.

Therefore, I think postgres must continue recovery by fetching files from
pg_xlog/ when it encounters a partially filled archive files. In addition,
it may be necessary to remove the partially filled archived files, because
they might prevent media recovery in the future (is this true?). I mean we
need the following fix. What do you think?

--------------------------------------------------
if (expectedSize > 0 && stat_buf.st_size != expectedSize)
{
int elevel;
...
if (StandbyMode && stat_buf.st_size < expectedSize)
elevel = DEBUG1;
else
{
elevel = LOG;
unlink(xlogpath);
}
ereport(elevel,
(errmsg("archive file \"%s\" has wrong size: %lu instead of %lu",
xlogfname,
(unsigned long) stat_buf.st_size,
(unsigned long) expectedSize)));
return false;
}
--------------------------------------------------

I've heard that the next minor release is scheduled during this weekend. I
really wish this problem will be fixed in that release. If you wish, I'll
post the patch tomorrow or the next day. Could you include the fix in the
weekend release?

Regards
MauMau

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-12-06 15:19:35 Re: strange isolation test buildfarm failure on guaibasaurus
Previous Message Vik Reykja 2012-12-06 14:20:43 Re: DEALLOCATE IF EXISTS