Re: BUG #4879: bgwriter fails to fsync the file in recovery mode

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4879: bgwriter fails to fsync the file in recovery mode
Date: 2009-06-25 14:02:36
Message-ID: 4A43837C.2040903@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Simon Riggs wrote:
> On Thu, 2009-06-25 at 12:55 +0000, Fujii Masao wrote:
>> The following bug has been logged online:
>>
>> Bug reference: 4879
>> Logged by: Fujii Masao
>> Email address: masao(dot)fujii(at)gmail(dot)com
>> PostgreSQL version: 8.4dev
>> Operating system: RHEL5.1 x86_64
>> Description: bgwriter fails to fsync the file in recovery mode
>> Details:
>
> Looking at it now.

Thanks.

>> I suspect that the cause of this error is the race condition between
>> file deletion by startup process and fsync by bgwriter: TRUNCATE xlog
>> record immediately deletes the corresponding file, while it might be
>> scheduled to be fsynced by bgwriter. We should leave the actual file
>> deletion to bgwriter instead of startup process, like normal mode?

I think the real problem is this in mdunlink():

> /* Register request to unlink first segment later */
> if (!isRedo && forkNum == MAIN_FORKNUM)
> register_unlink(rnode);

When we replay the unlink of the relation, we don't te bgwriter about
it. Normally we do, so bgwriter knows that if the fsync() fails with
ENOENT, it's ok since the file was deleted.

It's tempting to just remove the "!isRedo" condition, but then we have
another problem: if bgwriter hasn't been started yet, and the shmem
queue is full, we get stuck in register_unlink() trying to send the
message and failing.

In archive recovery, we always start bgwriter at the beginning of WAL
replay. In crash recovery, we don't start bgwriter until the end of wAL
replay. So we could change the "!isRedo" condition to
"!InArchiveRecovery". It's not a very clean solution, but it's simple.

Hmm, what happens when the startup process performs a write, and
bgwriter is not running? Do the fsync requests queue up in the shmem
queue until the end of recovery when bgwriter is launched? I guess I'll
have to try it out...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-06-25 14:03:28 Re: BUG #4878: function age() give a wrong interval
Previous Message Simon Riggs 2009-06-25 13:33:29 Re: BUG #4879: bgwriter fails to fsync the file in recovery mode