Skip site navigation (1) Skip section navigation (2)

Re: BUG #4879: bgwriter fails to fsync the file in recovery mode

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4879: bgwriter fails to fsync the file in recovery mode
Date: 2009-06-25 14:02:36
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-bugs
Simon Riggs wrote:
> On Thu, 2009-06-25 at 12:55 +0000, Fujii Masao wrote:
>> The following bug has been logged online:
>> Bug reference:      4879
>> Logged by:          Fujii Masao
>> Email address:      masao(dot)fujii(at)gmail(dot)com
>> PostgreSQL version: 8.4dev
>> Operating system:   RHEL5.1 x86_64
>> Description:        bgwriter fails to fsync the file in recovery mode
>> Details: 
> Looking at it now.


>> I suspect that the cause of this error is the race condition between
>> file deletion by startup process and fsync by bgwriter: TRUNCATE xlog
>> record immediately deletes the corresponding file, while it might be
>> scheduled to be fsynced by bgwriter. We should leave the actual file
>> deletion to bgwriter instead of startup process, like normal mode?

I think the real problem is this in mdunlink():

> 	/* Register request to unlink first segment later */
> 	if (!isRedo && forkNum == MAIN_FORKNUM)
> 		register_unlink(rnode);

When we replay the unlink of the relation, we don't te bgwriter about
it. Normally we do, so bgwriter knows that if the fsync() fails with
ENOENT, it's ok since the file was deleted.

It's tempting to just remove the "!isRedo" condition, but then we have
another problem: if bgwriter hasn't been started yet, and the shmem
queue is full, we get stuck in register_unlink() trying to send the
message and failing.

In archive recovery, we always start bgwriter at the beginning of WAL
replay. In crash recovery, we don't start bgwriter until the end of wAL
replay. So we could change the "!isRedo" condition to
"!InArchiveRecovery". It's not a very clean solution, but it's simple.

Hmm, what happens when the startup process performs a write, and
bgwriter is not running? Do the fsync requests queue up in the shmem
queue until the end of recovery when bgwriter is launched? I guess I'll
have to try it out...

  Heikki Linnakangas

In response to


pgsql-bugs by date

Next:From: Tom LaneDate: 2009-06-25 14:03:28
Subject: Re: BUG #4878: function age() give a wrong interval
Previous:From: Simon RiggsDate: 2009-06-25 13:33:29
Subject: Re: BUG #4879: bgwriter fails to fsync the file in recoverymode

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group