Re: FSM corruption leading to errors

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM corruption leading to errors
Date: 2016-10-19 10:07:56
Message-ID: CABOikdO=Tryjc9CiKBdbXP3KjcRGZgNY9mT=AKLFszUCRpEgQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 19, 2016 at 2:37 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

>
>>
> Actually, this is still not 100% safe. Flushing the WAL before modifying
> the FSM page is not enough. We also need to WAL-log a full-page image of
> the FSM page, otherwise we are still vulnerable to the torn page problem.
>
> I came up with the attached. This is fortunately much simpler than my
> previous attempt. I replaced the MarkBufferDirtyHint() calls with
> MarkBufferDirty(), to fix the original issue, plus WAL-logging a full-page
> image to fix the torn page issue.
>
>
Looks good to me.

> BTW any thoughts on race-condition on the primary? Comments at
>> MarkBufferDirtyHint() seems to suggest that a race condition is possible
>> which might leave the buffer without the DIRTY flag, but I'm not sure if
>> that can only happen when the page is locked in shared mode.
>>
>
> I think the race condition can only happen when the page is locked in
> shared mode. In any case, with this proposed fix, we'll use
> MarkBufferDirty() rather than MarkBufferDirtyHint(), so it's moot.
>
>
Yes, the fix will cover that problem (if it exists). The reason why I was
curious to know is because there are several reports of similar error in
the past and some of them did not involve as standby. Those reports mostly
remained unresolved and I wondered if this explains them. But yeah, my
conclusion was that the race is not possible with page locked in EXCLUSIVE
mode. So may be there is another problem somewhere or a crash recovery may
have left the FSM in inconsistent state.

Anyways, we seem good to go with the patch.

Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2016-10-19 10:08:39 Re: LLVM Address Sanitizer (ASAN) and valgrind support
Previous Message Simon Riggs 2016-10-19 09:53:19 Re: Indirect indexes