Re: FSM corruption leading to errors

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM corruption leading to errors
Date: 2016-10-19 09:07:13
Message-ID: acc4128a-bc4c-baca-92ae-14eefac640ea@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/18/2016 07:01 AM, Pavan Deolasee wrote:
> On Mon, Oct 17, 2016 at 4:34 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
>> visibilitymap_truncate is actually also wrong, in a different way. The
>> truncation WAL record is written only after the VM (and FSM) are truncated.
>> But visibilitymap_truncate() has already modified and dirtied the page. If
>> the VM page change is flushed to disk before the WAL record, and you crash,
>> you might have a torn VM page and a checksum failure.
>
> Right, I missed the problem with checksums.
>
>> Simply replacing the MarkBufferDirtyHint() call with MarkBufferDirty() in
>> FreeSpaceMapTruncateRel would have the same issue. If you call
>> MarkBufferDirty(), you must WAL-log the change, and also set the page's LSN
>> to make sure the WAL record is flushed first.
>
> I'm not sure I fully understand this. If the page is flushed before the WAL
> record, we might have a truncated FSM where as the relation truncation is
> not replayed. But that's not the same problem, right? I mean, you might
> have an FSM which is not accurate, but it won't at the least return the
> blocks outside the range. Having said that, I agree your proposed changes
> are more robust.

Actually, this is still not 100% safe. Flushing the WAL before modifying
the FSM page is not enough. We also need to WAL-log a full-page image of
the FSM page, otherwise we are still vulnerable to the torn page problem.

I came up with the attached. This is fortunately much simpler than my
previous attempt. I replaced the MarkBufferDirtyHint() calls with
MarkBufferDirty(), to fix the original issue, plus WAL-logging a
full-page image to fix the torn page issue.

> BTW any thoughts on race-condition on the primary? Comments at
> MarkBufferDirtyHint() seems to suggest that a race condition is possible
> which might leave the buffer without the DIRTY flag, but I'm not sure if
> that can only happen when the page is locked in shared mode.

I think the race condition can only happen when the page is locked in
shared mode. In any case, with this proposed fix, we'll use
MarkBufferDirty() rather than MarkBufferDirtyHint(), so it's moot.

- Heikki

Attachment Content-Type Size
truncation-vm-fsm-2.patch application/x-download 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2016-10-19 09:36:44 Re: Aggregate Push Down - Performing aggregation on foreign server
Previous Message Umair Shahid 2016-10-19 09:06:15 Draft for next update release (scheduled for 27th Oct)