Proposal for fixing IPC key assignment

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Proposal for fixing IPC key assignment
Date: 2000-11-26 02:55:18
Message-ID: 27865.975207318@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I looked over the last discussion of selecting IPC keys for shared memory
and semaphores (pghackers thread "shmem_seq may be a bad idea" starting
4/30/00). There were some good ideas there, but the discussion still
assumed that there would be only one postmaster running on a given port
number on a system, so that the port number is an adequate unique
identifier to generate IPC keys from. That assumption has been broken
by the UUNET "virtual hosting" patch; furthermore, Joel Burton's recent
tale of woe reminds us that the interlock that keeps two postmasters from
starting on the same port number is not bulletproof anyway. So, here is
a new proposal that still works when multiple postmasters share a port
number.

It's nice to generate IPC keys that use the port number as high-order
digits, since in typical cases that makes it much easier to tell which IPC
objects belong to which postmaster. (I still dislike ftok-generated keys
for the same reasons I enumerated before: they don't guarantee uniqueness,
so they don't actually simplify life at all; and they do make it hard to
tell which IPC object is which.)

What we must be able to do is cope with collisions in selected key
numbers. The collision may be against a key number belonging to another
application, or one belonging to another still-active postmaster, or one
belonging to a dead postmaster, or one belonging to a previous reset cycle
of the current postmaster. We want to detect the latter two cases and
attempt to free the no-longer-used IPC object, without risking breaking
things in the first two cases. If we cannot free the IPC object then we
must move on to a new key number and try again.

To identify shmem segments reliably, I propose we adopt a convention that
the first word of every Postgres shmem segment contain a magic number
(some constant we select at random) and the second word contain the PID
of the creating postmaster or standalone backend. The magic number will
help guard against misidentifying segments belonging to other applications.

Pseudocode for allocating a new shmem segment is as follows:

// Do this during startup or at the beginning of postmaster reset:
NextShmemSegID := port# * 1000;

// Do this to allocate each shared memory segment:
for (NextShmemSegID++ ; ; NextShmemSegID++)
{
Attempt to create shmem seg with ID NextShmemSegID, desired size,
and flags IPC_EXCL;
if (successful)
break;
if (error is not due to key collision)
fail;
Attempt to attach to shmem seg with ID NextShmemSegID;
if (unsuccessful)
continue; // segment is some other app's
if (first word is not correct magic number)
detach and continue; // segment is some other app's
if (second word is PID of some existing process other than me)
detach and continue; // segment is some other postmaster's
// segment appears to be from a dead postmaster or my previous cycle,
// so try to get rid of it
detach from segment;
try to delete segment;
if (unsuccessful)
continue; // segment does not belong to Postgres user
Attempt to create shmem seg with ID NextShmemSegID, desired size,
and flags IPC_EXCL;
if (successful)
break;
// Can only get here if some other process just grabbed the same
// shmem key. Let him have that one, loop around to try another.
}
// NextShmemSegID is ID of successfully created segment;
// attach to it and set the PID and magic-number words, IN THAT ORDER.

Note that at each postmaster reset, we restart the NextShmemSegID
counter. Therefore, during a reset we will normally find the same
shmem keys we used on the last cycle, free them, and reuse them.

The magic-number word is not really necessary; it just improves the
odds that we won't clobber some other app's shared mem. To get into
trouble that way, the other app would have to (a) be running as the
same user as Postgres (else we'll not be able to delete its shmem);
(b) use one of the same shmem keys as we do; and (c) have the magic
number as the value of the first word in its shmem. (a) and (b)
are already pretty unlikely, but I like the extra bit of assurance.

With this scheme we are not dependent at all on the assumption of
different postmasters having different port numbers. Running multiple
postmasters on the same port number has no consequences worse than
slightly slowing down postmaster startup while we search for currently
unused shmem keys.

This scheme also fixes our problems with dying if there is an existing
shmem segment of the right key but wrong size, as can happen after a
version upgrade or change of -N or -B parameters. Since the scheme always
deletes and recreates an old segment, rather than trying to use it as-is,
it handles size changes automatically.

The exact same logic can be applied to assignment of IPC semaphore
sets, but there is a small difficulty to be resolved: where do we
put the identification information (magic number and creator's PID)?
I propose that we allocate one extra semaphore in each semaphore set
--- ie, 17 per set instead of 16 --- to hold this info. This extra
semaphore is never touched during normal operations. During creation
of a semaphore set, the creating process does
semctl(semid, 16, SETVAL, PGSEMMAGIC-1);
followed by a semop() increment of that semaphore. This leaves the
extra semaphore with a count of PGSEMMAGIC and a sempid referencing
the creating process. These values can be read nondestructively
by other postmasters using semctl(). We will attempt to free an old
semaphore set only if it has exactly 17 semaphores and the last one
has the right count and a sempid that doesn't refer to another live
process.

We have to assign PGSEMMAGIC small enough to ensure that it won't fall
foul of SEMVMX, but that probably isn't a big problem. A more serious
potential portability issue is that some implementations might not
support the semctl(GETPID) operation (ie, get PID of last process that
did a semop() on that semaphore). It seems like a pretty basic part
of the SysV semaphore functionality to me, but ... Anyone know of any
platforms where that's missing?

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-11-26 04:35:40 Re: [HACKERS] Indexing for geographic objects?
Previous Message Don Baccus 2000-11-26 02:54:21 Re: Re: [NOVICE] Re: re : PHP and persistent connections