Re: PATCH: two slab-like memory allocators

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: two slab-like memory allocators
Date: 2016-11-14 00:20:00
Message-ID: efd47210-cf95-d7ee-e984-a873f76596a8@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/12/2016 08:12 PM, Andres Freund wrote:
> Hi,
>
> Subject: [PATCH 1/2] slab allocator
>
> diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
> index 6ad7e7d..520f295 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
>
> I'd rather have that in a separate followup commit...
>
>
> + * IDENTIFICATION
> + * src/backend/utils/mmgr/slab.c
> + *
> + *
> + * The constant allocation size allows significant simplification and various
> + * optimizations that are not possible in AllocSet. Firstly, we can get rid
> + * of the doubling and carve the blocks into chunks of exactly the right size
> + * (plus alignment), not wasting memory.
>
> References to AllocSet aren't necessarily a good idea, they'll quite
> possibly get out of date. The argument can be quite easily be made
> without referring to a concrete reference to behaviour elsewhere.
>

Yeah, that's probably true.

> + *
> + * At the context level, we use 'freelist' array to track blocks grouped by
> + * number of free chunks. For example freelist[0] is a list of completely full
> + * blocks, freelist[1] is a block with a single free chunk, etc.
>
> Hm. Those arrays are going to be quite large for small allocations w/
> big blocks (an imo sensible combination). Maybe it'd make more sense to
> model it as a linked list of blocks? Full blocks are at one end, empty
> ones at the other?

So there'd be one huge list of blocks, sorted by the number of empty
chunks? Hmm, that might work I guess.

I don't think the combination of large blocks with small allocations is
particularly sensible, though - what exactly would be the benefit of
such combination? I would even consider enforcing some upper limit on
the number of chunks per block - say, 256, for example.

>
>
> + * About MEMORY_CONTEXT_CHECKING:
> + *
> + * Since we usually round request sizes up to the next power of 2, there
> + * is often some unused space immediately after a requested data
> area.
>
> I assume the "round up" stuff is copy-paste?
>

Yeah, sorry about that.

>
> + * Thus, if someone makes the common error of writing past what they've
> + * requested, the problem is likely to go unnoticed ... until the day when
> + * there *isn't* any wasted space, perhaps because of different memory
> + * ...
> + *
> + * See also the cooperating Valgrind client requests in mcxt.c.
>
> I think we need a preliminary patch moving a lot of this into something
> like mcxt_internal.h. Copying this comment, and a lot of the utility
> functions, into every memory context implementation is a bad pattern.
>

Yes, I planned to do that for the next version of patch. Laziness.

>
> +typedef struct SlabBlockData *SlabBlock; /* forward reference */
> +typedef struct SlabChunkData *SlabChunk;
>
> Can we please not continue hiding pointers behind typedefs? It's a bad
> pattern, and that it's fairly widely used isn't a good excuse to
> introduce further usages of it.
>

Why is it a bad pattern?

> +/*
> + * SlabContext is a specialized implementation of MemoryContext.
> + */
> +typedef struct SlabContext
> +{
> + MemoryContextData header; /* Standard memory-context fields */
> + /* Allocation parameters for this context: */
> + Size chunkSize; /* chunk size */
> + Size fullChunkSize; /* chunk size including header and alignment */
> + Size blockSize; /* block size */
> + int chunksPerBlock; /* number of chunks per block */
> + int minFreeChunks; /* min number of free chunks in any block */
> + int nblocks; /* number of blocks allocated */
> + /* Info about storage allocated in this context: */
> + SlabBlock freelist[1]; /* free lists (block-level) */
>
> I assume this is a variable-length array? If so, that a) should be
> documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
> actually will cause compiler warnings and potential misoptimizations.
>

Will fix, thanks.

> +/*
> + * SlabBlockData
> + * Structure of a single block in SLAB allocator.
> + *
> + * slab: context owning this block
>
> What do we need this for?
>

You're right the pointer to the owning context is unnecessary - there's
nothing like "standard block header" and we already have the pointer in
the standard chunk header. But maybe keeping the pointer at least with
MEMORY_CONTEXT_CHECKING would be a good idea?

> + * prev, next: used for doubly-linked list of blocks in global freelist
>
> I'd prefer using an embedded list here (cf. ilist.h).
>

Makes sense.

> +/*
> + * SlabChunk
> + * The prefix of each piece of memory in an SlabBlock
> + *
> + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
> + * However it's possible to fields in front of the StandardChunkHeader fields,
> + * which is used to add pointer to the block.
> + */
>
> Wouldn't that be easier to enforce - particularly around alignment
> requirements - by embedding a StandardChunkHeader here? That'd also
> avoid redundancies.
>

Also makes sense.

> +/* ----------
> + * Debug macros
> + * ----------
> + */
> +#ifdef HAVE_ALLOCINFO
> +#define SlabFreeInfo(_cxt, _chunk) \
> + fprintf(stderr, "SlabFree: %s: %p, %d\n", \
> + (_cxt)->header.name, (_chunk), (_chunk)->size)
> +#define SlabAllocInfo(_cxt, _chunk) \
> + fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \
> + (_cxt)->header.name, (_chunk), (_chunk)->size)
> +#else
> +#define SlabFreeInfo(_cxt, _chunk)
> +#define SlabAllocInfo(_cxt, _chunk)
> +#endif
>
> Do we really have to copy that stuff from aset.c? Obviously no-one uses
> that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is
> defined:
> /home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘Size {aka long unsigned int}’ [-Wformat=]
> fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
>

I don't really care. Sure, we should fix the warning, but not supporting
HAVE_ALLOCINFO in the new allocator(s) seems wrong - we should either
support it everywhere, or we should rip it out. That's not the purpose
of this patch, though.

>
> +#ifdef CLOBBER_FREED_MEMORY
> +
> +/* Wipe freed memory for debugging purposes */
> +static void
> +wipe_mem(void *ptr, size_t size)
>
> +#ifdef MEMORY_CONTEXT_CHECKING
> +static void
> +set_sentinel(void *base, Size offset)
> +
> +static bool
> +sentinel_ok(const void *base, Size offset)
> +#endif
>
> +#ifdef RANDOMIZE_ALLOCATED_MEMORY
> +static void
> +randomize_mem(char *ptr, size_t size)
>
> Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as
> static inlines.
>

Yes, next to the valgrind stuff.

> +MemoryContext
> +SlabContextCreate(MemoryContext parent,
> + const char *name,
> + Size blockSize,
> + Size chunkSize)
> +{
> + int chunksPerBlock;
> + Size fullChunkSize;
> + Slab set;
> +
> + /* chunk, including SLAB header (both addresses nicely aligned) */
> + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));
>
> + /* make sure the block can store at least one chunk (plus a bitmap) */
> + if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1)
> + elog(ERROR, "block size %ld for slab is too small for chunks %ld",
> + blockSize, chunkSize);
>
> I assume the 1 is the bitmap size?
>

Yes, the smallest bitmap is 1 byte.

>
> + /* so how many chunks can we fit into a block, including header and bitmap? */
> + chunksPerBlock
> + = (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * fullChunkSize + 1);
>
> I'm slightly drunk due to bad airline wine, but right now that seems a
> bit odd and/or documentation worthy. I understand the (xxx + 7) / 8
> pattern elsewhere, but right now I can't follow the - 7.
>

We need all the bits (header, chunks and bitmap) to fit onto the block,
so this needs to hold:

blockSize >= sizeof(SlabBlockData) +
chunksPerBlock * fullChunkSize +
(chunksPerBlock + 7) / 8

solve for 'chunksPerBlock' and you'll get the above formula. Moving the
7 to the other side of the inequality is the reason for the minus.

But documenting this is probably a good idea.

> +/*
> + * SlabAlloc
> + * Returns pointer to allocated memory of given size or NULL if
> + * request could not be completed; memory is added to the set.
> + *
> + * No request may exceed:
> + * MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ
> + * All callers use a much-lower limit.
>
> That seems like a meaningless comment in the context of a slab allocator
> with a fixed size.
>

Why? It might be worth moving this to SlabContextCreate though.

>
> + /*
> + * If there are no free chunks in any existing block, create a new block
> + * and put it to the last freelist bucket.
> + *
> + * (set->minFreeChunks == 0) means there are no blocks with free chunks,
> + * thanks to how minFreeChunks is updated at the end of SlabAlloc().
> + */
> + if (set->minFreeChunks == 0)
> + {
> + block = (SlabBlock)malloc(set->blockSize);
>
> Space after cast - maybe run pgindent over the file before submission?
> Doing that manually helps to avoid ugly damange by the per-release run
> later. I'm pretty sure there'll be a significant number of changes.
>

Will do.

>
>
> + if (block->nfree == 0)
> + block->firstFreeChunk = set->chunksPerBlock;
> + else
> + {
> + /* look for the next free chunk in the block, after the first one */
> + while ((++block->firstFreeChunk) < set->chunksPerBlock)
> + {
> + int byte = block->firstFreeChunk / 8;
> + int bit = block->firstFreeChunk % 8;
> +
> + /* stop when you find 0 (unused chunk) */
> + if (! (block->bitmapptr[byte] & (0x01 << bit)))
> + break;
> + }
>
> I haven't profiled (or even compiled) this patchset yet, but FWIW, in
> the tuple deforming code, I could measure a noticeable speedup by
> accessing bitmap-bytes in the native word-size, rather than char. I'm
> *NOT* saying you should do that, but if this ever shows up as a
> bottlneck, it might be worthwhile to optimize.
>

OK, will try, although I don't expect this branch to be very hot.

> + /*
> + * And finally update minFreeChunks, i.e. the index to the block with the
> + * lowest number of free chunks. We only need to do that when the block
> + * got full (otherwise we know the current block is the right one).
> + * We'll simply walk the freelist until we find a non-empty entry.
> + */
> + if (set->minFreeChunks == 0)
> + for (idx = 1; idx <= set->chunksPerBlock; idx++)
> + if (set->freelist[idx])
> + {
> + set->minFreeChunks = idx;
> + break;
> + }
>
> Yuck. This definitely needs braces.
>

OK ;-)

thanks

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-11-14 01:35:51 Re: Physical append-only tables
Previous Message Tsunakawa, Takayuki 2016-11-13 23:46:15 Re: [RFC] Should we fix postmaster to avoid slow shutdown?