Re: Lots of FSM-related fragility in transaction commit

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Lots of FSM-related fragility in transaction commit
Date: 2011-12-19 23:26:03
Message-ID: 5430.1324337163@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 08.12.2011 08:20, Tom Lane wrote:
>> I thought that removing the unreadable file would let me restart,
>> but after "rm 52860_fsm" and trying again to start the server,
>> there's a different problem:
>> LOG: consistent recovery state reached at 0/5D71BA8
>> LOG: redo starts at 0/5100050
>> WARNING: page 18 of relation base/27666/52860 is uninitialized
>> CONTEXT: xlog redo visible: rel 1663/27666/52860; blk 18
>> PANIC: WAL contains references to invalid pages

> The bug here is that we consider having immediately reached consistency
> during crash recovery.

That fixes only part of the problem I was on about, though: I don't
believe that this type of situation should occur in the first place.
We should not be throwing ERROR during post-commit attempts to unlink
the physical files of deleted relations.

After some investigation I believe that the problem was introduced
by the changes to support multiple "forks" in a relation. Specifically,
instead of just doing "smgrdounlink()" during post-commit, we now
do something like this:

for (fork = 0; fork <= MAX_FORKNUM; fork++)
{
if (smgrexists(srel, fork))
smgrdounlink(srel, fork, false);
}

and if you look into mdexists() you'll find that it throws ERROR for any
unexpected errno. This makes smgrdounlink's attempts to be robust
rather pointless.

I'm not entirely sure whether that behavior of mdexists is a good thing,
but it strikes me that the smgrexists call is pretty unnecessary here in
the first place. We should just get rid of it and do smgrdounlink
unconditionally. The only reason it's there is to keep smgrdounlink
from whinging about ENOENT on non-primary forks, which is simply stupid
for smgrdounlink to be doing anyway --- it is actually *more* willing to
complain about ENOENT on non-primary than primary forks, which has no
sense that I can detect.

Accordingly I propose the attached patch for HEAD, and similar changes
in all branches that have multiple forks. Note that this makes the
warning-report conditions identical for both code paths in mdunlink.

regards, tom lane

Attachment Content-Type Size
unrecoverable-crash-during-unlink.patch text/x-patch 4.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2011-12-19 23:26:37 Re: JSON for PG 9.2
Previous Message Kevin Grittner 2011-12-19 23:14:16 Re: Page Checksums