Simon Riggs wrote:
> On Thu, 2009-06-25 at 12:55 +0000, Fujii Masao wrote:
>> The following bug has been logged online:
>> Bug reference: 4879
>> Logged by: Fujii Masao
>> Email address: masao(dot)fujii(at)gmail(dot)com
>> PostgreSQL version: 8.4dev
>> Operating system: RHEL5.1 x86_64
>> Description: bgwriter fails to fsync the file in recovery mode
> Looking at it now.
>> I suspect that the cause of this error is the race condition between
>> file deletion by startup process and fsync by bgwriter: TRUNCATE xlog
>> record immediately deletes the corresponding file, while it might be
>> scheduled to be fsynced by bgwriter. We should leave the actual file
>> deletion to bgwriter instead of startup process, like normal mode?
I think the real problem is this in mdunlink():
> /* Register request to unlink first segment later */
> if (!isRedo && forkNum == MAIN_FORKNUM)
When we replay the unlink of the relation, we don't te bgwriter about
it. Normally we do, so bgwriter knows that if the fsync() fails with
ENOENT, it's ok since the file was deleted.
It's tempting to just remove the "!isRedo" condition, but then we have
another problem: if bgwriter hasn't been started yet, and the shmem
queue is full, we get stuck in register_unlink() trying to send the
message and failing.
In archive recovery, we always start bgwriter at the beginning of WAL
replay. In crash recovery, we don't start bgwriter until the end of wAL
replay. So we could change the "!isRedo" condition to
"!InArchiveRecovery". It's not a very clean solution, but it's simple.
Hmm, what happens when the startup process performs a write, and
bgwriter is not running? Do the fsync requests queue up in the shmem
queue until the end of recovery when bgwriter is launched? I guess I'll
have to try it out...
In response to
pgsql-bugs by date
|Next:||From: Tom Lane||Date: 2009-06-25 14:03:28|
|Subject: Re: BUG #4878: function age() give a wrong interval |
|Previous:||From: Simon Riggs||Date: 2009-06-25 13:33:29|
|Subject: Re: BUG #4879: bgwriter fails to fsync the file in recoverymode|