Neil Conway <neilc(at)samurai(dot)com> writes:
> I'd like to enable PostgreSQL to use large TLB pages, if the OS and
> processor support them.
Hmm ... it seems interesting, but I'm hesitant to do a lot of work
to support something that's only available on one hardware-and-OS
combination. (If we were talking about a Windows-specific hack,
you'd already have lost the audience, no? But I digress.)
> So as I understand it, we would basically replace the calls to
> shmget(), shmdt(), etc. with these system calls. The behavior will be
> slightly different, however -- I'm not sure if this API supports
> everything we expect the SysV IPC API to support (e.g. telling the #
> of clients attached to a given segment).
I trust it at least supports inheriting the page mapping over a fork()?
> Can anyone comment on
> exactly what functionality we expect when dealing with the storage
> mechanism of the shared buffer?
The only thing we use beyond the obvious "here's some memory accessible
by both parent and child processes" is the #-of-clients functionality
you mentioned. The reason that that is interesting is it provides a
safety interlock against the case where a postmaster has crashed but
left child backends running. If a new postmaster is started and starts
its own collection of children then we are in very bad hot water,
because the old and new backend sets will be modifying the same database
files without any mutual awareness or interlocks. This *will* lead to
serious, possibly unrecoverable database corruption.
The SysV API provides a reliable interlock to prevent this scenario:
we read the old shared memory block ID from the old postmaster's
postmaster.pid file, and look to see if that block (a) still exists
and (b) still has attached processes (presumably backends). If it's
gone or has no attached processes, it's safe for the new postmaster
to continue startup.
I have little love for the SysV shmem API, but I haven't thought of
an equivalently reliable interlock for this scenario without it.
(For example, something along the lines of requiring each backend
to write its PID into a file isn't very reliable at all: it leaves
a window at each backend start where the backend hasn't yet written
its PID, and it increases by a large factor the risk we've already
seen wherein stale PID entries in lockfiles might by chance match the
PIDs of other, unrelated processes.)
Any ideas for better answers?
regards, tom lane
In response to
pgsql-hackers by date
|Next:||From: Neil Conway||Date: 2002-09-25 04:56:32|
|Subject: Re: pltcl.so patch|
|Previous:||From: Tom Lane||Date: 2002-09-25 04:01:24|
|Subject: Re: pg_dump and inherited attributes |