Skip site navigation (1) Skip section navigation (2)

Re: Posix Shared Mem patch

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Posix Shared Mem patch
Date: 2012-06-28 05:00:07
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Wed, Jun 27, 2012 at 9:44 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Would Posix shmem help with that at all?  Why did you choose not to
>>> use the Posix API, anyway?
>> It seemed more complicated.  If we use the POSIX API, we've got to
>> have code to find a non-colliding name for the shm, and we've got to
>> arrange to clean it up at process exit.  Anonymous shm doesn't require
>> a name and goes away automatically when it's no longer in use.
> I see.  Those are pretty good reasons ...

So, should we do it this way?

I did a little research and discovered that Linux 2.3.51 (released
3/11/2000) apparently returns EINVAL for MAP_SHARED|MAP_ANONYMOUS.
That combination is documented to work beginning in Linux 2.4.0.  How
worried should we be about people trying to run PostgreSQL 9.3 on
pre-2.4 kernels?  If we want to worry about it, we could try mapping a
one-page shared MAP_SHARED|MAP_ANONYMOUS segment first.  If that
works, we could assume that we have a working MAP_SHARED|MAP_ANONYMOUS
facility and try to allocate the whole segment plus a minimal sysv
shm.  If the single page allocation fails with EINVAL, we could fall
back to allocating the entire segment as sysv shm.

A related question is - if we do this - should we enable it only on
ports where we've verified that it works, or should we just turn it on
everywhere and fix breakage if/when it's reported?  I lean toward the

If we find that there are platforms where (a) mmap is not supported or
(b) MAP_SHARED|MAP_ANON works but has the wrong semantics, we could
either shut off this optimization on those platforms by fiat, or we
could test not only that the call succeeds, but that it works
properly: create a one-page mapping and fork a child process; in the
child, write to the mapping and exit; in the parent, wait for the
child to exit and then test that we can read back the correct
contents.  This would protect against a hypothetical system where the
flags are accepted but fail to produce the correct behavior.  I'm
inclined to think this is over-engineering in the absence of evidence
that there are platforms that work this way.


Robert Haas
The Enterprise PostgreSQL Company

In response to


pgsql-hackers by date

Next:From: Kyotaro HORIGUCHIDate: 2012-06-28 05:04:22
Subject: Re: pl/perl and utf-8 in sql_ascii databases
Previous:From: Tom LaneDate: 2012-06-28 04:51:53
Subject: Re: We probably need autovacuum_max_wraparound_workers

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group