Problem after removal of exec(), help

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: hackers(at)postgreSQL(dot)org (PostgreSQL-development)
Subject: Problem after removal of exec(), help
Date: 1998-06-22 14:45:25
Message-ID: 199806221445.KAA13553@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Since the removal of exec(), Thomas has seen, and I have confirmed that
if a backend crashes, and the postmaster must reset the shared memory,
no backends can connect anymore. One way to reproduce it is to run the
regression tests, which on their last test will crash for an un-related
reason. However, it will not allow you to restart any more backends.

The error it gets is:

Failed Assertion("!((((unsigned long)nextElem) > ShmemBase)):", File: "shmqueue.
c", Line: 83)
!((((unsigned long)nextElem) > ShmemBase)) (0) [No such file or directory]

In this case nextElem = ShmemBase, so it is not greater. Removing the
Assert() still does not make things work, so there must be something
else.

Now, the problem is probably not at that exact spot, but somewhere
deeper. There are two differences between the old non-exec() behavior
and new behavior. In the old setup, the backend had all its global
variables initialized, while in the new no-exec case, they take the
global variable values from the postmaster. Second, the old setup had
each backend attaching to the shared memory, while the new setup has
them inheriting the shared memory from the fork().

My guess is that there is something buggy about the reset code in
postmaster.c that was not resetting completely, but the initialization
of the global variables in the backend was masking the bug, or the
attach() operation did some extra work that we now need to do when
resetting the shared memory:

static void
reset_shared(short port)
{
ipc_key = port * 1000 + shmem_seq * 100;
CreateSharedMemoryAndSemaphores(ipc_key);
ActiveBackends = FALSE;
shmem_seq += 1;
if (shmem_seq >= 10)
shmem_seq -= 10;
}

I am stumped on this.

--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Keith Parks 1998-06-22 19:40:15 Divide by zero error on SPARC/Linux.
Previous Message The Hermit Hacker 1998-06-22 11:49:18 Re: btree: BTP_CHAIN flag was expected (revisited)