Re: dynamic shared memory

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dynamic shared memory
Date: 2013-09-03 03:31:50
Message-ID: 20130903033150.GA119849@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 03, 2013 at 12:52:22AM +0200, Andres Freund wrote:
> On 2013-09-01 12:07:04 -0400, Noah Misch wrote:
> > On Sun, Sep 01, 2013 at 05:08:38PM +0200, Andres Freund wrote:
> > > On 2013-09-01 09:24:00 -0400, Noah Misch wrote:
> > > > The difficulty depends on whether processes other than the segment's creator
> > > > will attach anytime or only as they start. Attachment at startup is enough
> > > > for parallel query, but it's not enough for something like lock table
> > > > expansion. I'll focus on the attach-anytime case since it's more general.
> > >
> > > Even on startup it might get more complicated than one immediately
> > > imagines on EXEC_BACKEND type platforms because their memory layout
> > > doesn't need to be the same. The more shared memory you need, the harder
> > > that will be. Afair
> >
> > Non-Windows EXEC_BACKEND is already facing a dead end that way.
>
> Not sure whether you mean non-windows EXEC_BACKEND isn't going to be
> supported for much longer or that it already has problems.

It already has problems: ASLR measures sometimes prevent reattachment of the
main shared memory segment. Multiplying the combined size of our
fixed-address mappings does not push us over some threshold where this becomes
a problem, because it is already a problem.

> > > Note that allocating a large mapping, even without using it, has
> > > noticeable cost, at least under linux. The kernel has to create & copy
> > > data to track each pages state (without copying the memory content's
> > > itself due to COW) for every fork afterwards.

> So, after reading up on the issue a bit more and reading some more
> kernel code, a large mmap(PROT_NONE, MAP_PRIVATE) won't cause much
> problems except counting in ulimit -v. It will *not* cause overcommit
> violations. mmap(PROT_NONE, MAP_SHARED) will tho, even if not yet
> faulted. Which means that to be reliable and not violate overcommit we'd
> need to munmap() a chunk of PROT_NONE, MAP_PRIVATE memory, and
> immediately (without interceding mallocs, using mmap itself) map it again.
>
> It only gets really expensive in the sense of making fork expensive if
> you set protections on many regions in that mapping individually. Each
> mprotect() call will split the VMA into distinct pieces and they won't
> get merged even if there are neighboors with the same settings.

Thanks for researching that.

> > > > I don't foresee fundamental differences on 32-bit. All the allocation
> > > > maximums scale down, but that's the usual story for 32-bit.
> > >
> > > If you actually want to allocate memory after starting up, without
> > > carving a section out for that from the beginning, the memory
> > > fragmentation will make it very hard to find memory addresses of the
> > > same across processes.
> >
> > True. I wouldn't feel bad if total dynamic shared memory usage above, say,
> > 256 MiB were unreliable on 32-bit. If you're still running 32-bit in 2015,
> > you probably have a low-memory platform.
>
> Not sure. I think that will partially depend on whether x32 will have
> any success which I still find hard to judge.

I won't hold my breath for x32 becoming a common platform for high-memory
database servers, regardless of other successes it might find. Not
impossible, but I recommend placing trivial priority on maximizing performance
for that scenario.

> > I think the take-away is that we have a lot of knobs available, not a bright
> > line between possible and impossible. Robert opted to omit provision for
> > reliable fixed addresses, and the upsides of that decision are the absence of
> > a DBA-unfriendly space-reservation GUC, trivial overhead when the APIs are not
> > used, and a clearer portability outlook.
>
> I guess my point is that if we want to develop stuff that requires
> reliable addresses, we should build support for that from a low level
> up. Not rely on a hack^Wlayer ontop of the actual dynamic shared memory
> API.
> That is, it should be a flag to dsm_create() that we require a fixed
> address and dsm_attach() will then automatically use that or die
> trying. Requiring implementations to take care about passing addresses
> around and fiddling with mmap/windows api to make sure those mappings
> are possible doesn't strike me to be a good idea.

I agree.

> In the end, you're going to be the primary/first user as far as I
> understand things, so you'll have to argue whether we need fixed
> addresses or not. I don't think it's a good idea to forgo this decision
> on this layer and bolt on another ontop if we decide it's neccessary.

We don't need fixed addresses. Parallel internal sort will probably include
the equivalent of a SortTuple array in its shared memory segment, and that
implies relative pointers to the tuples also stored in shared memory. I
expect that wart to be fairly isolated within the code, so little harm done.

I don't think we will have at all painted ourselves into a corner, should we
wish to lift the limitation later.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-09-03 03:32:40 Re: Further XLogInsert scaling tweaking
Previous Message Michael Paquier 2013-09-03 03:09:11 Re: Extension Templates S03E11