Re: Detecting corrupted pages earlier

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Detecting corrupted pages earlier
Date: 2003-03-28 20:27:34
Message-ID: 24613.1048883254@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp> writes:
> How about adding a new option to skip corrupted pages ?

I have committed changes to implement checking for damaged page headers,
along the lines of last month's discussion. It includes a GUC variable
to control the response as suggested by Hiroshi.

Given the number of data-corruption reports we've seen recently, I am
more than half tempted to commit the change into the 7.3.* branch too.
However the GUC variable makes it seem a little like a new feature.

I could backpatch the whole change, or backpatch it without the GUC
variable (so the only possible response is elog(ERROR)), or leave
well enough alone. Any votes?

The patch is pretty small; I include the non-boilerplate parts below.

regards, tom lane

*** /home/tgl/pgsql/src/backend/storage/buffer/bufmgr.c.orig Tue Mar 25 09:06:11 2003
--- /home/tgl/pgsql/src/backend/storage/buffer/bufmgr.c Fri Mar 28 11:54:48 2003
***************
*** 59,64 ****
--- 60,69 ----
(*((XLogRecPtr*) MAKE_PTR((bufHdr)->data)))


+ /* GUC variable */
+ bool zero_damaged_pages = false;
+
+
static void WaitIO(BufferDesc *buf);
static void StartBufferIO(BufferDesc *buf, bool forInput);
static void TerminateBufferIO(BufferDesc *buf);
***************
*** 217,222 ****
--- 222,241 ----
{
status = smgrread(DEFAULT_SMGR, reln, blockNum,
(char *) MAKE_PTR(bufHdr->data));
+ /* check for garbage data */
+ if (status == SM_SUCCESS &&
+ !PageHeaderIsValid((PageHeader) MAKE_PTR(bufHdr->data)))
+ {
+ if (zero_damaged_pages)
+ {
+ elog(WARNING, "Invalid page header in block %u of %s; zeroing out page",
+ blockNum, RelationGetRelationName(reln));
+ MemSet((char *) MAKE_PTR(bufHdr->data), 0, BLCKSZ);
+ }
+ else
+ elog(ERROR, "Invalid page header in block %u of %s",
+ blockNum, RelationGetRelationName(reln));
+ }
}

if (isLocalBuf)
*** /home/tgl/pgsql/src/backend/storage/page/bufpage.c.orig Tue Mar 25 09:06:12 2003
--- /home/tgl/pgsql/src/backend/storage/page/bufpage.c Fri Mar 28 11:38:43 2003
***************
*** 48,53 ****
--- 46,96 ----
}


+ /*
+ * PageHeaderIsValid
+ * Check that the header fields of a page appear valid.
+ *
+ * This is called when a page has just been read in from disk. The idea is
+ * to cheaply detect trashed pages before we go nuts following bogus item
+ * pointers, testing invalid transaction identifiers, etc.
+ *
+ * It turns out to be necessary to allow zeroed pages here too. Even though
+ * this routine is *not* called when deliberately adding a page to a relation,
+ * there are scenarios in which a zeroed page might be found in a table.
+ * (Example: a backend extends a relation, then crashes before it can write
+ * any WAL entry about the new page. The kernel will already have the
+ * zeroed page in the file, and it will stay that way after restart.) So we
+ * allow zeroed pages here, and are careful that the page access macros
+ * treat such a page as empty and without free space. Eventually, VACUUM
+ * will clean up such a page and make it usable.
+ */
+ bool
+ PageHeaderIsValid(PageHeader page)
+ {
+ char *pagebytes;
+ int i;
+
+ /* Check normal case */
+ if (PageGetPageSize(page) == BLCKSZ &&
+ PageGetPageLayoutVersion(page) == PG_PAGE_LAYOUT_VERSION &&
+ page->pd_lower >= SizeOfPageHeaderData &&
+ page->pd_lower <= page->pd_upper &&
+ page->pd_upper <= page->pd_special &&
+ page->pd_special <= BLCKSZ &&
+ page->pd_special == MAXALIGN(page->pd_special))
+ return true;
+
+ /* Check all-zeroes case */
+ pagebytes = (char *) page;
+ for (i = 0; i < BLCKSZ; i++)
+ {
+ if (pagebytes[i] != 0)
+ return false;
+ }
+ return true;
+ }
+
+
/* ----------------
* PageAddItem
*

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jason M. Felice 2003-03-28 20:42:43 EXPLAIN ANALYZE in comparable units
Previous Message Jason M. Felice 2003-03-28 18:52:20 Re: PostgreSQL and SOAP, version 7.4/8.0