Re: BUG #4879: bgwriter fails to fsync the file in recovery mode

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4879: bgwriter fails to fsync the file in recovery mode
Date: 2009-06-26 15:07:24
Message-ID: 4A44E42C.3000204@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> - CreateCheckPoint() calls AdvanceXLInsertBuffer() before setting
>> LocalWALWriteAllowed. I don't see anything in AdvanceXLInsertBuffer()
>> that would fail, but it doesn't feel right. While strictly speaking it
>> doesn't insert new WAL records, it does write WAL page headers.
>
> The ordering is necessary because we have to do that before we start
> flushing buffers, and XLogFlush has to see WriteAllowed as FALSE or it
> will do the wrong thing. If we ever put something into
> AdvanceXLInsertBuffer that would depend on this, we could flip the flag
> to TRUE just for the duration of calling AdvanceXLInsertBuffer, though
> I agree that's a bit ugly. Perhaps calling the flag/routine
> XLogInsertAllowed() would alleviate this issue somewhat?

Yeah, it would look much less dirty if called XLogInsertAllowed().

>> - As noted with an XXX comment in the patch, CreateCheckPoint() now
>> resets LocalWALWriteAllowed to false after a shutdown/end-of-recovery
>> checkpoint. But that's not enough to stop WAL inserts after a shutdown
>> checkpoint, because when RecoveryInProgress() is false, we
>> WALWriteAllowed() still returns true. We haven't had such a safeguard in
>> place before, so we can keep living without it, but now that we have a
>> WALWriteAllowed() macro it would be nice if it returned false when WAL
>> writes are no longer allowed after a shutdown checkpoint. (that would've
>> caught a bug in Guillaume Smet's original patch to rotate a WAL segment
>> at shutdown, where the xlog switch was done after shutdown checkpoint)
>
> It would be possible to make LocalWALWriteAllowed a three-state flag:
> * allowed, don't check RecoveryInProgress,
> * disallowed, don't check RecoveryInProgress
> * check RecoveryInProgress, transition to first state if clear
> Not sure if worth the trouble.

One idea is to merge LocalWALWriteAllowed and LocalRecoveryInProgress
(and the corresponding shared variables too) into one XLogState variable
with three states
* in-recovery
* normal operation
* shutdown.

The variable would always move from a lower state to higher, similar to
PMState in postmaster.c, so it could be cached like RecoveryInProgress
works now. WAL inserts would only be allowed in normal operation. An
end-of-recovery checkpoint would set LocalXLogState to "normal
operation" ahead of SharedXLogState, to allow writing the checkpoint
record, and a shutdown checkpoint would set Shared- and LocalXLogState
to "shutdown" right after writing the checkpoint record.

>> On whole, this is probably the right way going forward, but I'm not sure
>> if it'd make 8.4 more or less robust than what's in CVS now.
>
> I think we should do it. There is a lot of stuff I'm still not happy
> with in this area, but without a clean and understandable set of state
> definitions we aren't going to be able to improve it.

I agree, we need clear states and invariants that can be checked and
relied on. This gets even more complicated when the hot standby stuff
gets involved.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-06-26 15:20:35 Re: BUG #4879: bgwriter fails to fsync the file in recovery mode
Previous Message Tom Lane 2009-06-26 14:49:16 Re: BUG #4879: bgwriter fails to fsync the file in recovery mode