Dynamic shared memory areas

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Dynamic shared memory areas
Date: 2016-08-19 07:07:18
Message-ID: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I would like to propose a new subsystem called Dynamic Shared [Memory]
Areas, or "DSA". It provides an object called a "dsa_area" which can
be used by multiple backends to share data. Under the covers, a
dsa_area is made up of some number of DSM segments, but it appears to
client code as a single shared memory heap with a simple allocate/free
interface. Because the memory is mapped at different addresses in
different backends, it introduces a kind of sharable relative pointer
and an operation to convert it to a backend-local pointer.

After you have created or attached to a dsa_area, you can use it much
like MemoryContextAlloc/pfree, except for the extra hoop to jump
through to get the local address:

dsa_pointer p;
char *mem;

p = dsa_allocate(area, 42);
mem = (char *) dsa_get_address(area, p);
if (mem != NULL)
{
snprintf(mem, 42, "Hello world");
dsa_free(area, p);
}

Exposing the dsa_pointer in this way allows client code to build data
structures with internal dsa_pointers that will be usable in all
backends that attach to the dsa_area.

DSA areas have many potential uses, including shared workspaces for
various kinds of parallel query execution, longer term storage for
in-memory database objects, caches and so forth. In some cases it may
be useful to use a dsa_area directly, but there could be a library of
useful data structures that know how to use DSA memory. More on all
of those topics, with patches, soon.

SOME CONTEXT

Currently, Postgres provides three classes of memory:

1. Backend-local memory, managed with palloc/pfree, and MemoryContext
providing a hierarchy of memory heaps tied to various scopes.
Underneath that, there is of course the C runtime's heap and
allocator.

2. Traditional non-extensible shared memory mapped into every backend
at the same address. This works on Unix because child processes
inherit the memory map of the postmaster. In EXEC_BACKEND builds
(including Windows) it works because you can ask for memory to be
mapped at a specific address and it'll succeed if ASLR is turned off
and the backend hasn't been running very long and the address range
happens to be still free. This memory is currently managed with an
allocate-only allocator. There is a small library of data structures
that know how to use (but never free) this memory.

3. DSM memory, our abstraction for shared memory segments created on
demand in non-postmaster backends. This memory is mapped at different
addresses in different backends. Currently its main use is to provide
a chunk of memory for parallel query. To manage the space inside a
DSM segment, shm_toc ('table-of-contents') can be used as a kind of
allocate-only space manager which allows backends to find the
backend-local address of objects within the segment using integer
keys.

This proposal adds a fourth class, building on the third. Compared
with the existing memory classes:

* It provides a fully general allocate/free facility, as currently
available only in (1), though does not have (1)'s directly
dereferenceable pointers.

* It grows automatically and can in theory grow as big as virtual
memory allows, like (1), though it also provides a way to cap total
size so that allocations fail beyond some size.

* It provides something like the throw-it-all-away-at-once clean-up
facility of (1), since DSA areas can be destroyed, are reference
counted, and can optionally be tracked by the resource manager
mechanism (riding on DSM's coat tails).

* It provides the data sharing between backends of (2) and (3), though
doesn't have (2)'s directly dereferenceable pointers.

* Through proposals that will follow this one, it will provide for
basic data structures that build on top of it such as hash tables,
like (2), except that these ones will be able to grow as required and
give memory back.

* Unlike (1) and (2), client code has to deal with incompatible memory
maps. This involves calling dsa_get_address(area, relative_pointer)
which amounts to a few instructions to perform a base address lookup
and pointer arithmetic.

Using processes instead of threads gives Postgres certain advantages,
but requires us to deal with shared memory instead of just using
something like (1) for all our memory needs, as a hypothetical
multi-threaded Postgres fork would presumably do. This proposal is a
step towards making our shared memory facilities more powerful and
general.

IMPLEMENTATON AND HISTORY

Back in 2014, Robert Haas proposed sb_alloc[1]. It had two layers:

* a 'free page manager' which cuts a piece of memory into 4KB pages
and embeds a btree into the empty pages to track contiguous runs of
pages, so that you can get and put free page ranges

* an allocator which manages a set of backend-private memory regions,
each of which has a free page manager; large allocations are handled
directly with pages from the free page manager in an existing region,
or new regions created as required with malloc; allocations <= 8KB are
handled with pools (called "heaps" in that patch) of various object
sizes ("size classes") that live in 64KB superblocks, which in turn
come from the free page manager

DSA uses Robert's free page manager unchanged, except for some
debugging by me. It uses the same general approach and much of the
code for the higher level allocator, but I have reworked it
substantially to replace the MemoryContext interface, put it in DSM
segments, introduce the multi-segment relative pointer scheme, and add
concurrency support.

Compared to some well known malloc implementations which this code
takes general inspiration from, the main differences are obviously the
shared memory nature, the lack of per-core pools (an avenue for future
research that would increase concurrent performance at the cost of
increased fragmentation), and it has that lower level page manager.
Some other systems go directly to the OS (mmap, sbrk) for superblocks
and large objects. The equivalent for us would be to throw away the
lower layer and simply create a DSM segment for large allocations and
64KB superblocks, but there are implementation and portability reasons
not to want to create very large numbers of DSM segments.

Compared to palloc/pfree, DSA aims to waste less space. It has more
finely gained size classes (8, 16, 24, 32, 40, 48, ... see
dsa_size_classes), uses a page map that uses 8 bytes per 4KB page to
keep track of how to free memory instead of putting bookkeeping
information in front of every object.

Some other notes in no particular order: It's admittedly slightly
confusing that the patch currently contains two separate relative
pointer concepts: relptr is used by Robert's freespace.c code and
provides for sort-of-type-checked offsets relative to a single base,
and dsa_pointer is used by dsa.c to provide multi-segment relative
pointers that encode a segment index in the higher bits. The lock
tranche arguments to dsa_create_dynamic are clunky, but I don't have a
better idea currently since you can't allocate and free tranche IDs so
I don't see how dsa.c can own that problem. The "dynamic" part of
dsa_create_dynamic's name reflects a desire to have an alternative
"fixed" version where you can provide it with an already existing
piece of memory to manage, such as a pre-existing DSM segment, but
that has not been implemented. It's desirable to allow atomic ops on
dsa_pointer; I believe Andres Freund plans to make that happen for 64
bit values on 32 bit systems, but if that turns out to be problematic
I would want to make dsa_pointer 32 bits on 32 bit systems.

PATCH

First, please apply dsm-unpin-segment-v2.patch[2], and then
dsm-handle-invalid.patch (attached, and also proposed), and finally
dsa-v1.patch. I have also attached test-dsa.patch, a small module
which exercises the allocator and shows some client code.

Thanks to my colleagues Robert Haas for the sb_alloc code that morphed
into this patch, and John Gorman and Amit Khandekar for feedback and
testing.

I'd be most grateful for any feedback. Thanks for reading!

[1] https://www.postgresql.org/message-id/flat/CA%2BTgmobkeWptGwiNa%2BSGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAEepm%3D29DZeWf44-4fzciAQ14iY5vCVZ6RUJ-KR2yzs3hPzrkw%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
dsm-handle-invalid.patch application/octet-stream 2.1 KB
dsa-v1.patch application/octet-stream 129.4 KB
test-dsa.patch application/octet-stream 11.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2016-08-19 07:32:50 Re: synchronous_commit = remote_flush
Previous Message Tatsuo Ishii 2016-08-19 06:49:12 Re: Slowness of extended protocol