Re: PATCH: two slab-like memory allocators

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: two slab-like memory allocators
Date: 2016-11-12 19:12:57
Message-ID: 20161112191257.3a4n6kzegvu637qn@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Subject: [PATCH 1/2] slab allocator

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 6ad7e7d..520f295 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c

I'd rather have that in a separate followup commit...

+ * IDENTIFICATION
+ * src/backend/utils/mmgr/slab.c
+ *
+ *
+ * The constant allocation size allows significant simplification and various
+ * optimizations that are not possible in AllocSet. Firstly, we can get rid
+ * of the doubling and carve the blocks into chunks of exactly the right size
+ * (plus alignment), not wasting memory.

References to AllocSet aren't necessarily a good idea, they'll quite
possibly get out of date. The argument can be quite easily be made
without referring to a concrete reference to behaviour elsewhere.

+ *
+ * At the context level, we use 'freelist' array to track blocks grouped by
+ * number of free chunks. For example freelist[0] is a list of completely full
+ * blocks, freelist[1] is a block with a single free chunk, etc.

Hm. Those arrays are going to be quite large for small allocations w/
big blocks (an imo sensible combination). Maybe it'd make more sense to
model it as a linked list of blocks? Full blocks are at one end, empty
ones at the other?

+ * About MEMORY_CONTEXT_CHECKING:
+ *
+ * Since we usually round request sizes up to the next power of 2, there
+ * is often some unused space immediately after a requested data
area.

I assume the "round up" stuff is copy-paste?

+ * Thus, if someone makes the common error of writing past what they've
+ * requested, the problem is likely to go unnoticed ... until the day when
+ * there *isn't* any wasted space, perhaps because of different memory
+ * alignment on a new platform, or some other effect. To catch this sort
+ * of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond
+ * the requested space whenever the request is less than the actual chunk
+ * size, and verifies that the byte is undamaged when the chunk is freed.
+ *
+ *
+ * About USE_VALGRIND and Valgrind client requests:
+ *
+ * Valgrind provides "client request" macros that exchange information with
+ * the host Valgrind (if any). Under !USE_VALGRIND, memdebug.h stubs out
+ * currently-used macros.
+ *
+ * When running under Valgrind, we want a NOACCESS memory region both before
+ * and after the allocation. The chunk header is tempting as the preceding
+ * region, but mcxt.c expects to able to examine the standard chunk header
+ * fields. Therefore, we use, when available, the requested_size field and
+ * any subsequent padding. requested_size is made NOACCESS before returning
+ * a chunk pointer to a caller. However, to reduce client request traffic,
+ * it is kept DEFINED in chunks on the free list.
+ *
+ * The rounded-up capacity of the chunk usually acts as a post-allocation
+ * NOACCESS region. If the request consumes precisely the entire chunk,
+ * there is no such region; another chunk header may immediately follow. In
+ * that case, Valgrind will not detect access beyond the end of the chunk.
+ *
+ * See also the cooperating Valgrind client requests in mcxt.c.

I think we need a preliminary patch moving a lot of this into something
like mcxt_internal.h. Copying this comment, and a lot of the utility
functions, into every memory context implementation is a bad pattern.

+typedef struct SlabBlockData *SlabBlock; /* forward reference */
+typedef struct SlabChunkData *SlabChunk;

Can we please not continue hiding pointers behind typedefs? It's a bad
pattern, and that it's fairly widely used isn't a good excuse to
introduce further usages of it.

+/*
+ * SlabContext is a specialized implementation of MemoryContext.
+ */
+typedef struct SlabContext
+{
+ MemoryContextData header; /* Standard memory-context fields */
+ /* Allocation parameters for this context: */
+ Size chunkSize; /* chunk size */
+ Size fullChunkSize; /* chunk size including header and alignment */
+ Size blockSize; /* block size */
+ int chunksPerBlock; /* number of chunks per block */
+ int minFreeChunks; /* min number of free chunks in any block */
+ int nblocks; /* number of blocks allocated */
+ /* Info about storage allocated in this context: */
+ SlabBlock freelist[1]; /* free lists (block-level) */

I assume this is a variable-length array? If so, that a) should be
documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
actually will cause compiler warnings and potential misoptimizations.

+/*
+ * SlabBlockData
+ * Structure of a single block in SLAB allocator.
+ *
+ * slab: context owning this block

What do we need this for?

+ * prev, next: used for doubly-linked list of blocks in global freelist

I'd prefer using an embedded list here (cf. ilist.h).

+/*
+ * SlabChunk
+ * The prefix of each piece of memory in an SlabBlock
+ *
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
+ * However it's possible to fields in front of the StandardChunkHeader fields,
+ * which is used to add pointer to the block.
+ */

Wouldn't that be easier to enforce - particularly around alignment
requirements - by embedding a StandardChunkHeader here? That'd also
avoid redundancies.

+/* ----------
+ * Debug macros
+ * ----------
+ */
+#ifdef HAVE_ALLOCINFO
+#define SlabFreeInfo(_cxt, _chunk) \
+ fprintf(stderr, "SlabFree: %s: %p, %d\n", \
+ (_cxt)->header.name, (_chunk), (_chunk)->size)
+#define SlabAllocInfo(_cxt, _chunk) \
+ fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \
+ (_cxt)->header.name, (_chunk), (_chunk)->size)
+#else
+#define SlabFreeInfo(_cxt, _chunk)
+#define SlabAllocInfo(_cxt, _chunk)
+#endif

Do we really have to copy that stuff from aset.c? Obviously no-one uses
that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is
defined:
/home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘Size {aka long unsigned int}’ [-Wformat=]
fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \

+#ifdef CLOBBER_FREED_MEMORY
+
+/* Wipe freed memory for debugging purposes */
+static void
+wipe_mem(void *ptr, size_t size)

+#ifdef MEMORY_CONTEXT_CHECKING
+static void
+set_sentinel(void *base, Size offset)
+
+static bool
+sentinel_ok(const void *base, Size offset)
+#endif

+#ifdef RANDOMIZE_ALLOCATED_MEMORY
+static void
+randomize_mem(char *ptr, size_t size)

Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as
static inlines.

+MemoryContext
+SlabContextCreate(MemoryContext parent,
+ const char *name,
+ Size blockSize,
+ Size chunkSize)
+{
+ int chunksPerBlock;
+ Size fullChunkSize;
+ Slab set;
+
+ /* chunk, including SLAB header (both addresses nicely aligned) */
+ fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));

+ /* make sure the block can store at least one chunk (plus a bitmap) */
+ if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1)
+ elog(ERROR, "block size %ld for slab is too small for chunks %ld",
+ blockSize, chunkSize);

I assume the 1 is the bitmap size?

+ /* so how many chunks can we fit into a block, including header and bitmap? */
+ chunksPerBlock
+ = (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * fullChunkSize + 1);

I'm slightly drunk due to bad airline wine, but right now that seems a
bit odd and/or documentation worthy. I understand the (xxx + 7) / 8
pattern elsewhere, but right now I can't follow the - 7.

+/*
+ * SlabAlloc
+ * Returns pointer to allocated memory of given size or NULL if
+ * request could not be completed; memory is added to the set.
+ *
+ * No request may exceed:
+ * MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ
+ * All callers use a much-lower limit.

That seems like a meaningless comment in the context of a slab allocator
with a fixed size.

+ /*
+ * If there are no free chunks in any existing block, create a new block
+ * and put it to the last freelist bucket.
+ *
+ * (set->minFreeChunks == 0) means there are no blocks with free chunks,
+ * thanks to how minFreeChunks is updated at the end of SlabAlloc().
+ */
+ if (set->minFreeChunks == 0)
+ {
+ block = (SlabBlock)malloc(set->blockSize);

Space after cast - maybe run pgindent over the file before submission?
Doing that manually helps to avoid ugly damange by the per-release run
later. I'm pretty sure there'll be a significant number of changes.

+ if (block->nfree == 0)
+ block->firstFreeChunk = set->chunksPerBlock;
+ else
+ {
+ /* look for the next free chunk in the block, after the first one */
+ while ((++block->firstFreeChunk) < set->chunksPerBlock)
+ {
+ int byte = block->firstFreeChunk / 8;
+ int bit = block->firstFreeChunk % 8;
+
+ /* stop when you find 0 (unused chunk) */
+ if (! (block->bitmapptr[byte] & (0x01 << bit)))
+ break;
+ }

I haven't profiled (or even compiled) this patchset yet, but FWIW, in
the tuple deforming code, I could measure a noticeable speedup by
accessing bitmap-bytes in the native word-size, rather than char. I'm
*NOT* saying you should do that, but if this ever shows up as a
bottlneck, it might be worthwhile to optimize.

+ /*
+ * And finally update minFreeChunks, i.e. the index to the block with the
+ * lowest number of free chunks. We only need to do that when the block
+ * got full (otherwise we know the current block is the right one).
+ * We'll simply walk the freelist until we find a non-empty entry.
+ */
+ if (set->minFreeChunks == 0)
+ for (idx = 1; idx <= set->chunksPerBlock; idx++)
+ if (set->freelist[idx])
+ {
+ set->minFreeChunks = idx;
+ break;
+ }

Yuck. This definitely needs braces.

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-11-12 19:18:25 Re: Logical Replication WIP
Previous Message Karl O. Pinc 2016-11-12 18:59:46 Re: Patch to implement pg_current_logfile() function