Re: making use of large TLB pages

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-28 15:05:45
Message-ID: 12491.1033225545@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> If we used a key that would remain the same between runs of the
> postmaster, this should ensure that there isn't a possibility of two
> independant sets of backends operating on the same data dir. The most
> logical way to do this IMHO would be to just hash the data dir, but I
> suppose the current method of using the port number should work as
> well.

You should stick as closely as possible to the key logic currently used
for SysV shmem keys. That logic is intended to cope with the case where
someone else is already using the key# that we initially generate, as
well as the case where we discover a collision with a pre-existing
backend set. (We tell the difference by looking for a magic number at
the start of the shmem segment.)

Note that we do not assume the key is the same on each run; that's why
we store it in postmaster.pid.

> (1) call sys_alloc_hugepages() without IPC_EXCL. If it returns
> an error, we're in the clear: there's no page matching
> that key. If it returns a pointer to a previously existing
> segment, panic: it is very likely that there are some
> orphaned backends still active.

s/panic/and the PG magic number appears in the segment header, panic/

> - if we're compiling on a Linux system but the kernel headers
> don't define the syscalls we need, use some reasonable
> defaults (e.g. the syscall numbers for the current hugepage
> syscalls in Linux 2.5)

I think this is overkill, and quite possibly dangerous. If we don't see
the symbols then don't try to compile the code.

On the whole it seems that this allows a very nearly one-to-one mapping
to the existing SysV functionality. We don't have the "number of
connected processes" syscall, perhaps, but we don't need it: if a
hugepages segment exists we can assume the number of connected processes
is greater than 0, and that's all we really need to know.

I think it's okay to stuff this support into the existing
port/sysv_shmem.c file, rather than make a separate file (particularly
given your point that we have to be able to fall back to SysV calls at
runtime). I'd suggest reorganizing the code in that file slightly to
separate the actual syscalls from the controlling logic in
PGSharedMemoryCreate(). Probably also will have to extend the API for
PGSharedMemoryIsInUse() and RecordSharedMemoryInLockFile() to allow
three fields to be recorded in postmaster.pid, not two --- you'll want
a boolean indicating whether the stored key is for a SysV or hugepage
segment.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2002-09-28 15:09:24 Re: Vacuum from within a function crashes backend
Previous Message Alvaro Herrera 2002-09-28 15:02:30 Re: How to REINDEX in high volume environments?