Re: CreateLockFile() race condition

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CreateLockFile() race condition
Date: 2012-08-03 15:59:00
Message-ID: 18481.1344009540@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> The problem here is a race between concluding the assessment of a PID file as
> defunct and unlinking it; during that period, another postmaster may have
> replaced the PID file and proceeded. As far as I've been able to figure, this
> flaw is fundamental to any PID file invalidation algorithm relying solely on
> atomic filesystem operations like unlink(2), link(2), rename(2) and small
> write(2) for mutual exclusion. Do any of you see a way to remove the race?

Nasty. Still, the issue only exists for two postmasters launched at
just about exactly the same time, which is an unlikely case.

> I think we should instead implement postmaster mutual exclusion by way of
> fcntl(F_SETLK) on Unix and CreateFile(..., FILE_SHARE_READ, ...) on Windows.

I'm a bit worried about what new problems this solution is going to open
up. It seems not unlikely that the cure is worse than the disease.
Having locking that actually works on (some) NFS setups would be nice,
but ...

> The hazard[4] keeping fcntl locking from replacing the PGSharedMemoryIsInUse()
> check does not apply here, because the postmaster itself does not run
> arbitrary code that might reopen postmaster.pid.

False. See shared_preload_libraries.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-08-03 16:43:06 Re: CreateLockFile() race condition
Previous Message Noah Misch 2012-08-03 14:56:35 CreateLockFile() race condition