Re: dynamic shared memory

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dynamic shared memory
Date: 2013-08-27 14:07:33
Message-ID: 20130827140733.GD24807@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

[just sending an email which sat in my outbox for two weeks]

On 2013-08-13 21:09:06 -0400, Robert Haas wrote:
> ...

Nice to see this coming. I think it will actually be interesting for
quite some things outside parallel query, but we'll see.

I've not yet looked at the code, so I just have some highlevel comments
so far.

> To help solve these problems, I invented something called the "dynamic
> shared memory control segment". This is a dynamic shared memory
> segment created at startup (or reinitialization) time by the
> postmaster before any user process are created. It is used to store a
> list of the identities of all the other dynamic shared memory segments
> we have outstanding and the reference count of each. If the
> postmaster goes through a crash-and-reset cycle, it scans the control
> segment and removes all the other segments mentioned there, and then
> recreates the control segment itself. If the postmaster is killed off
> (e.g. kill -9) and restarted, it locates the old control segment and
> proceeds similarly.

That way any corruption in that area will prevent restarts without
reboot unless you use ipcrm, or such, right?

> Creating a shared memory segment is a somewhat operating-system
> dependent task. I decided that it would be smart to support several
> different implementations and to let the user choose which one they'd
> like to use via a new GUC, dynamic_shared_memory_type.

I think we want that during development, but I'd rather not go there
when releasing. After all, we don't support a manual choice between
anonymous mmap/sysv shmem either.

> In addition, I've included an implementation based on mmap of a plain
> file. As compared with a true shared memory implementation, this
> obviously has the disadvantage that the OS may be more likely to
> decide to write back dirty pages to disk, which could hurt
> performance. However, I believe it's worthy of inclusion all the
> same, because there are a variety of situations in which it might be
> more convenient than one of the other implementations. One is
> debugging.

Hm. Not sure what's the advantage over a corefile here.

> On MacOS X, for example, there seems to be no way to list
> POSIX shared memory segments, and no easy way to inspect the contents
> of either POSIX or System V shared memory segments.

Shouldn't we ourselves know which segments are around?

> Another use case
> is working around an administrator-imposed or OS-imposed shared memory
> limit. If you're not allowed to allocate shared memory, but you are
> allowed to create files, then this implementation will let you use
> whatever facilities we build on top of dynamic shared memory anyway.

I don't think we should try to work around limits like that.

> A third possible reason to use this implementation is
> compartmentalization. For example, you can put the directory that
> stores the dynamic shared memory segments on a RAM disk - which
> removes the performance concern - and then do whatever you like with
> that directory: secure it, put filesystem quotas on it, or sprinkle
> magic pixie dust on it. It doesn't even seem out of the question that
> there might be cases where there are multiple RAM disks present with
> different performance characteristics (e.g. on NUMA machines) and this
> would provide fine-grained control over where your shared memory
> segments get placed. To make a long story short, I won't be crushed
> if the consensus is against including this, but I think it's useful.

-1 so far. Seems a bit handwavy to me.

> Other implementations are imaginable but not implemented here. For
> example, you can imagine using the mmap() of an anonymous file.
> However, since the point is that these segments are created on the fly
> by individual backends and then shared with other backends, that gets
> a little tricky. In order for the second backend to map the same
> anonymous shared memory segment that the first one mapped, you'd have
> to pass the file descriptor from one process to the other.

It wouldn't even work. Several mappings of /dev/zero et al. do *not*
result in the same virtual memory being mapped. Not even when using the
same (passed around) fd.
Believe me, I tried ;)

> There are quite a few problems that this patch does not solve. First,
> while it does give you a shared memory segment, it doesn't provide you
> with any help at all in figuring out what to put in that segment. The
> task of figuring out how to communicate usefully through shared memory
> is thus, for the moment, left entirely to the application programmer.
> While there may be cases where that's just right, I suspect there will
> be a wider range of cases where it isn't, and I plan to work on some
> additional facilities, sitting on top of this basic structure, next,
> though probably as a separate patch.

Agreed.

> Second, it doesn't make any> policy decisions about what is sensible either in terms of number of
> shared memory segments or the sizes of those segments, even though
> there are serious practical limits in both cases. Actually, the total
> number of segments system-wide is limited by the size of the control
> segment, which is sized based on MaxBackends. But there's nothing to
> keep a single backend from eating up all the slots, even though that's
> pretty both unfriendly and unportable, and there's no real limit to
> the amount of memory it can gobble up per slot, either. In other
> words, it would be a bad idea to write a contrib module that exposes a
> relatively uncooked version of this layer to the user.

At this point I am rather unconcerned with this point to be
honest.

> --- /dev/null
> +++ b/src/include/storage/dsm.h
> @@ -0,0 +1,40 @@
> +/*-------------------------------------------------------------------------
> + *
> + * dsm.h
> + * manage dynamic shared memory segments
> + *
> + * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
> + *
> + * src/include/storage/dsm.h
> + *
> + *-------------------------------------------------------------------------
> + */
> +#ifndef DSM_H
> +#define DSM_H
> +
> +#include "storage/dsm_impl.h"
> +
> +typedef struct dsm_segment dsm_segment;
> +
> +/* Initialization function. */
> +extern void dsm_postmaster_startup(void);
> +
> +/* Functions that create, update, or remove mappings. */
> +extern dsm_segment *dsm_create(uint64 size, char *preferred_address);
> +extern dsm_segment *dsm_attach(dsm_handle h, char *preferred_address);
> +extern void *dsm_resize(dsm_segment *seg, uint64 size,
> + char *preferred_address);
> +extern void *dsm_remap(dsm_segment *seg, char *preferred_address);
> +extern void dsm_detach(dsm_segment *seg);

Why do we want to expose something unreliable as preferred_address to
the external interface? I haven't read the code yet, so I might be
missing something here.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-08-27 14:09:42 Re: Support for REINDEX CONCURRENTLY
Previous Message Andres Freund 2013-08-27 13:56:36 Re: dynamic background workers, round two