Re: Linux max on shared buffers?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Curt Sampson <cjs(at)cynic(dot)net>, GB Clark <postgres(at)vsservices(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, glenebob(at)nwlink(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: Linux max on shared buffers?
Date: 2002-07-19 15:38:10
Message-ID: 13083.1027093090@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> So far so good. Now what do you map when? Can you map multiple
> noncontigous 8K blocks out of each file? If so, how do you coordinate
> that all backends in summary use at maximum the number of blocks you
> want PostgreSQL to use (each unique block counts, regardless of how many
> backends have it mmap()'d, right?). And if a backend needs a block and
> the max is reached already, how does it tell the other backends to unmap
> something?

Just to throw some additional wrenches into the gears: some platforms
(eg HPPA) have strong restrictions on where you can mmap stuff.
I quote some interesting material from the HPUX mmap(2) man page below.
Possibly these restrictions could be worked around, but it looks
painful.

Note that platforms where these restrictions don't exist are not
automatically better: that just means that they're willing to swap a
larger number of address translation table entries for each process
dispatch. If we tried to mmap disk file blocks individually, our
process dispatch time would go to hell; but trying to map large ranges
instead (to hold down the number of translation entries) is going to
have a bunch of problems too.

regards, tom lane

If the size of the mapped file changes after the call to mmap(), the
effect of references to portions of the mapped region that correspond
to added or removed portions of the file is unspecified.

[ hence, any extension of a file requires re-mmaping; moreover
it appears that you cannot extend a file *at all* via mmap,
but must do so via write, after which you can re-mmap -- tgl ]

...

Because the PA-RISC memory architecture utilizes a globally shared
virtual address space between processes, and discourages multiple
virtual address translations to the same physical address, all
concurrently existing MAP_SHARED mappings of a file range must share
the same virtual address offsets and hardware translations. PA-RISC-
based HP-UX systems allocate virtual address ranges for shared memory
and shared mapped files in the range 0x80000000 through 0xefffffff.
This address range is used globally for all memory objects shared
between processes.

This implies the following:

o Any single range of a file cannot be mapped multiply into
different virtual address ranges.

o After the initial MAP_SHARED mmap() of a file range, all
subsequent MAP_SHARED calls to mmap() to map the same range
of a file must either specify MAP_VARIABLE in flags and
inherit the virtual address range the system has chosen for
this range, or specify MAP_FIXED with an addr that
corresponds exactly to the address chosen by the system for
the initial mapping. Only after all mappings for a file
range have been destroyed can that range be mapped to a
different virtual address.

o In most cases, two separate calls to mmap() cannot map
overlapping ranges in a file.

[ that statement holds GLOBALLY, not per-process -- tgl ]

The virtual address range
reserved for a file range is determined at the time of the
initial mapping of the file range into a process address
space. The system allocates only the virtual address range
necessary to represent the initial mapping. As long as the
initial mapping exists, subsequent attempts to map a
different file range that includes any portion of the
initial range may fail with an ENOMEM error if an extended
contiguous address range that preserves the mappings of the
initial range cannot be allocated.

o Separate calls to mmap() to map contiguous ranges of a file
do not necessarily return contiguous virtual address ranges.
The system may allocate virtual addresses for each call to
mmap() on a first available basis.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Swan 2002-07-19 15:39:32 Re: [GENERAL] id and ID in CREATE TABLE
Previous Message Bruce Momjian 2002-07-19 15:36:19 Re: [SQL] id and ID in CREATE TABLE