Re: All-zero page in GIN index causes assertion failure

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: All-zero page in GIN index causes assertion failure
Date: 2015-07-20 14:15:02
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/20/2015 11:14 AM, Heikki Linnakangas wrote:
> ISTM ginvacuumcleanup should check for PageIsNew, and put the page to
> the FSM. That's what btvacuumpage() gistvacuumcleanup() do.
> spgvacuumpage() seems to also check for PageIsNew(), but it seems broken
> in a different way: it initializes the page and marks the page as dirty,
> but it is not WAL-logged. That is a problem at least if checksums are
> enabled: if you crash you might have a torn page on disk, with invalid
> checksum.

Looking closer, heap vacuum does a similar thing, but there are two
mitigating factors that make it safe in practice, I think:

1. An empty heap page is all-zeroes except for the small page header in
the beginning of the page. For a torn page to matter, the page would
need to be torn in the header, but we rely elsewhere (control file)
anyway that a 512-byte sector update is atomic, so that shouldn't
happen. Note that this hinges on the fact that there is no special area
on heap pages, so you cannot rely on this for index pages.

2. The redo of the first insert/update on a heap page will always
re-initialize the page, even when full-page-writes are disabled. This is
the XLOG_HEAP_INIT_PAGE optimization. So it's not just an optimization,
it's required for correctness.

Heap update can also leave behind a page in the buffer cache that's been
initialized by RelationGetBufferForTuple but not yet WAL-logged.
However, it doesn't mark the buffer dirty, so the torn-page problem
cannot happen because the page will not be flushed to disk if nothing
else touches it. The XLOG_HEAP_INIT_PAGE optimization is needed in that
case too, however.

B-tree, GiST, and SP-GiST's relation extension work similarly, but they
have other mitigating factors. If a newly-initialized B-tree page is
left behind in the relation, it won't be reused for anything, and vacuum
will ignore it (by accident, I think; there is no explicit comment on
what will happen to such pages, but it will be treated like an internal
page and ignored). Eventually the buffer will be evicted from cache, and
because it's not marked as dirty, it will not be flushed to disk, and
will later be read back as all-zeros and vacuum will recycle it.

BRIN update is not quite right, however. brin_getinsertbuffer() can
initialize a page, but the caller might bail out without using the page
and WAL-logging the change. If that happens, the next update that uses
the same page will WAL-log the change but it will not use the
XLOG_BRIN_INIT_PAGE option, and redo will not initialize the page. Redo
will fail.

BTW, shouldn't there be a step in BRIN vacuum that scans all the BRIN
pages? If an empty page is missing from the FSM for any reason, there's
nothing to add it there.

This is all very subtle. The whole business of leaving behind an
already-initialized page in the buffer cache, without marking the buffer
as dirty, is pretty ugly. I wish we had a more robust pattern to handle
all-zero pages and relation extension.

Thoughts? As a minimal backpatchable fix, I think we should add the
check in ginvacuumpage() to initialize any all-zeros pages it
encounters. That needs to be WAL-logged, and WAL-logging needs to be
added to the page initialization in spgvacuumpage too.

- Heikki

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2015-07-20 14:17:49 Re: pg_trgm version 1.2
Previous Message Tom Lane 2015-07-20 13:49:13 Re: [HACKERS] object_classes array is broken, again