Neil Conway <neilc(at)samurai(dot)com> writes:
> I think it's worthwhile implementing this, if possible.
I wasn't objecting (I work for Red Hat, remember ;-)). I was just
saying there's a limit to the messiness I think we should accept.
>> The SysV API provides a reliable interlock to prevent this scenario:
>> we read the old shared memory block ID from the old postmaster's
>> postmaster.pid file, and look to see if that block (a) still exists
>> and (b) still has attached processes (presumably backends).
> If the postmaster is starting up and the segment still exists, could
> we assume that's an error condition, and force the admin to manually
> fix it?
It wasn't clear from your description whether large-TLB shmem segments
even have IDs that one could use to determine whether "the segment still
exists". If the segments are anonymous then how do you do that?
> It does make the system less robust, but I'm suspicious of any
> attempts to automagically fix a situation in which we *know* something
> has gone seriously wrong...
We've spent a lot of effort on trying to ensure that we (a) start up
when it's safe and (b) refuse to start up when it's not safe. While (b)
is clearly the more critical point, backsliding on (a) isn't real nice
either. People don't like postmasters that randomly fail to start.
> Another possibility might be to still allocate a small SysV shmem
> area, and use that to provide the interlock, while we allocate the
> buffer area using sys_alloc_hugepages. That's somewhat of a hack, but
> I think it would resolve the interlock problem, at least.
Not a bad idea ... I have not got a better one offhand ... but watch
out for SHMMIN settings.
regards, tom lane
In response to
pgsql-hackers by date
|Next:||From: Michael Paesold||Date: 2002-09-25 18:08:57|
|Subject: Bug in PL/pgSQL GET DIAGNOSTICS?|
|Previous:||From: Bruce Momjian||Date: 2002-09-25 17:22:01|
|Subject: Re: inquiry|