Re: Posix Shared Mem patch

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Posix Shared Mem patch
Date: 2012-06-27 11:41:55
Message-ID: CABUevEzqvh+6MV87YmXHF3MVjY9=hRh_dVAmVLPdUs_nrZM_jQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 27, 2012 at 3:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "A.M." <agentm(at)themactionfaction(dot)com> writes:
>> On 06/26/2012 07:30 PM, Tom Lane wrote:
>>>> I solved this via fcntl locking.
>
>>> No, you didn't, because fcntl locks aren't inherited by child processes.
>>> Too bad, because they'd be a great solution otherwise.
>
>> You claimed this last time and I replied:
>> http://archives.postgresql.org/pgsql-hackers/2011-04/msg00656.php
>
>> "I address this race condition by ensuring that a lock-holding violator
>> is the postmaster or a postmaster child. If such as condition is
>> detected, the child exits immediately without touching the shared
>> memory. POSIX shmem is inherited via file descriptors."
>
>> This is possible because the locking API allows one to request which PID
>> violates the lock. The child expects the lock to be held and checks that
>> the PID is the parent. If the lock is not held, that means that the
>> postmaster is dead, so the child exits immediately.
>
> OK, I went back and re-read the original patch, and I now agree that
> something like this is possible --- but I don't like the way you did
> it. The dependence on particular PIDs seems both unnecessary and risky.
>
> The key concept here seems to be that the postmaster first stakes a
> claim on the data directory by exclusive-locking a lock file.  If
> successful, it reduces that lock to shared mode (which can be done
> atomically, according to the SUS fcntl specification), and then holds
> the shared lock until it exits.  Spawned children will not initially
> have a lock, but what they can do is attempt to acquire shared lock on
> the lock file.  If fail, exit.  If successful, *check to see that the
> parent postmaster is still alive* (ie, getppid() != 1).  If so, the
> parent must have been continuously holding the lock, and the child has
> successfully joined the pool of shared lock holders.  Otherwise bail
> out without having changed anything.  It is the "parent is still alive"
> check, not any test on individual PIDs, that makes this work.
>
> There are two concrete reasons why I don't care for the
> GetPIDHoldingLock() way.  Firstly, the fact that you can get a blocking
> PID from F_GETLK isn't an essential part of the concept of file locking
> IMO --- it's just an incidental part of this particular API.  May I
> remind you that the reason we're stuck on SysV shmem in the first place
> is that we decided to depend on an incidental part of that API, namely
> nattch?  I would like to not require file locking to have any semantics
> more specific than "a process can hold an exclusive or a shared lock on
> a file, which is auto-released at process exit".  Secondly, in an NFS
> world I don't believe that the returned l_pid value can be trusted for
> anything.  If it's a PID from a different machine then it might
> accidentally conflict with one on our machine, or not.
>
> Reflecting on this further, it seems to me that the main remaining
> failure modes are (1) file locking doesn't work, or (2) idiot DBA
> manually removes the lock file.  Both of these could be ameliorated
> with some refinements to the basic idea.  For (1), I suggest that
> we tweak the startup process (only) to attempt to acquire exclusive lock
> on the lock file.  If it succeeds, we know that file locking is broken,
> and we can complain.  (This wouldn't help for cases where cross-machine
> locking is broken, but I see no practical way to detect that.)
> For (2), the problem really is that the proposed patch conflates the PID
> file with the lock file, but people are conditioned to think that PID
> files are removable.  I suggest that we create a separate, permanently
> present file that serves only as the lock file and doesn't ever get
> modified (it need have no content other than the string "Don't remove
> this!").  It'd be created by initdb, not by individual postmaster runs;
> indeed the postmaster should fail if it doesn't find the lock file
> already present.  The postmaster PID file should still exist with its
> current contents, but it would serve mostly as documentation and as
> server-contact information for pg_ctl; it would not be part of the data
> directory locking mechanism.
>
> I wonder whether this design can be adapted to Windows?  IIRC we do
> not have a bulletproof data directory lock scheme for Windows.
> It seems like this makes few enough demands on the lock mechanism
> that there ought to be suitable primitives available there too.

I assume you're saying we need to make changes in the internal API,
right? Because we alreayd have a windows native implementation of
shared memory that AFAIK works, so if the new Unix stuff can be done
with the same internal APIs, it shouldn't nede to be changed. (Sorry,
haven't followed the thread in detail)

If so - can we define exactly what properties it is we *need*?

(A native API worth looking at is e.g.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx
- but there are probably others as well if that one doesn't do)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2012-06-27 11:57:06 Re: plpython issue with Win64 (PG 9.2)
Previous Message Pavel Stehule 2012-06-27 11:37:35 Re: proof concept - access to session variables on client side