After a system crash on a RH 7.2 box (2.4.7-10 kernel), I found that
Postgres would not restart, complaining that it "found a pre-existing
shared memory block (ID so-and-so) still in use."
This is coming from code that attempts to defend against the scenario
where the postmaster crashed but one or more backends are still alive.
If we start a new postmaster and create a new shmem segment, the
consequences will be absolutely disastrous, because the old and new
backends will be modifying the same data files with no coordination.
So we look to see if the old shmem segment (whose ID is recorded in
the data directory lockfile) is still present and if so whether there
are any processes attached to it. See SharedMemoryIsInUse() in
The problem is that SharedMemoryIsInUse() expects shmctl to return
errno == EINVAL if the presented shmem segment ID is invalid. What
Linux 2.4.7 is actually returning is EIDRM (identifier removed).
The easy "fix" of taking EIDRM to be an allowable return code scares
me. At least on HPUX, the documented implication of this return code
is that the shmem segment is marked for deletion but is not yet gone
because there are still processes attached to it. That would be
exactly the scenario after a postmaster crash and manual "ipcrm" if
there were any old backends still alive. So, it seems to me that
accepting EIDRM would defeat the entire point of this test, at least
on some platforms.
Comments? Is 2.4.7 simply broken and returning the wrong errno?
If not, what should we do?
regards, tom lane
pgsql-hackers by date
|Next:||From: Gavin Sherry||Date: 2002-01-04 01:56:34|
|Subject: Re: [HACKERS] Updated TODO item|
|Previous:||From: Gavin Sherry||Date: 2002-01-04 01:17:09|
|Subject: Re: Updated TODO item|