Re: POSIX shared memory redux

From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POSIX shared memory redux
Date: 2011-04-14 14:26:33
Message-ID: BC618525-BB86-41BE-B8B4-D22419C99C45@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Apr 13, 2011, at 9:30 PM, Robert Haas wrote:

> On Wed, Apr 13, 2011 at 6:11 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:
>>> I don't see why we need to get rid of SysV shared memory; needing less
>>> of it seems just as good.
>>
>> 1. As long one keeps SysV shared memory around, the postgresql project has to maintain the annoying platform-specific document on how to configure the poorly named kernel parameters. If the SysV region is very small, that means I can run more postgresql instances within the same kernel limits, but one can still hit the limits. My patch allows the postgresql project to delete that page and the hassles with it.
>>
>> 2. My patch proves that SysV is wholly unnecessary. Are you attached to it? (Pun intended.)
>
> With all due respect, I think this is an unproductive conversation.
> Your patch proves that SysV is wholly unnecessary only if we also
> agree that fcntl() locking is just as reliable as the nattch
> interlock, and Tom and I are trying to explain why we don't believe
> that's the case. Saying that we're just wrong without responding to
> our points substantively doesn't move the conversation forward.

Sorry- it wasn't meant to be an attack- just a dumb pun. I am trying to argue that, even if the fcntl is unreliable, the startup procedure is just as reliable as it is now. The reasons being:

1) the SysV nattch method's primary purpose is to protect the shmem region. This is no longer necessary in my patch because the shared memory in unlinked immediately after creation, so only the initial postmaster and its children have access.

2) the standard postgresql lock file remains the same

Furthermore, there is indeed a case where the SysV nattch cannot work while the fcntl locking can indeed catch: if two separate machines have a postgresql data directory mounted over NFS, postgresql will currently allow both machines to start a postmaster in that directory because the SysV nattch check fails and then the pid in the lock file is the pid on the first machine, so postgresql will say "starting anyway". With fcntl locking, this can be fixed. SysV only has presence on one kernel.

>
> In case it's not clear, here again is what we're concerned about: A
> System V shm *cannot* be removed until nobody is attached to it. A
> lock file can be removed, or the lock can be accidentally released by
> the apparently innocuous operation of closing a file descriptor.
>
>> Both you and Tom have somehow assumed that the patch alters current postgresql behavior. In fact, the opposite is true. I haven't changed any of the existing behavior. The "robust" behavior remains. I merely added fcntl interlocking on top of the lock file to replace the SysV shmem check.
>
> This seems contradictory. If you replaced the SysV shmem check, then
> it's not there, which means you altered the behavior.

From what I understood, the primary purpose of the SysV check was to protect the shared memory from multiple stompers. The interlock was a neat side-effect.

The lock file contents are currently important to get the pid of a potential, conflicting postmaster. With the fcntl API, we can return a live conflicting PID (whether a postmaster or a stuck child), so that's an improvement. This could be used, for example, for STONITH, to reliably kill a dying replication clone- just loop on the pids returned from the lock.

Even if the fcntl check passes, the pid in the lock file is checked, so the lock file behavior remains the same.

If you were to implement a daemon with a shared data directory but no shared memory, how would implement the interlock? Would you still insist on SysV shmem? Unix daemons generally rely on lock files alone. Perhaps there is a different API on which we can agree.

Cheers,
M

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message A.M. 2011-04-14 14:34:14 Re: POSIX shared memory redux
Previous Message Noah Misch 2011-04-14 12:36:19 Re: pg_dump --binary-upgrade vs. ALTER TYPE ... DROP ATTRIBUTE