Re: PATCH: two slab-like memory allocators

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: two slab-like memory allocators
Date: 2017-02-09 21:37:32
Message-ID: 20170209213732.pykbsixeezcthotr@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
> src/backend/utils/mmgr/Makefile | 2 +-
> src/backend/utils/mmgr/aset.c | 115 +--------------------------------
> src/backend/utils/mmgr/memdebug.c | 131 ++++++++++++++++++++++++++++++++++++++
> src/include/utils/memdebug.h | 22 +++++++
> 4 files changed, 156 insertions(+), 114 deletions(-)
> create mode 100644 src/backend/utils/mmgr/memdebug.c

I'm a bit loathe to move these to a .c file - won't this likely make
these debugging tools even slower? Seems better to put some of them
them in a header as static inlines (not randomize, but the rest).

> From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
> From: Tomas Vondra <tomas(at)2ndquadrant(dot)com>
> Date: Wed, 30 Nov 2016 15:36:23 +0100
> Subject: [PATCH 2/3] slab allocator
>
> ---
> src/backend/replication/logical/reorderbuffer.c | 82 +--
> src/backend/utils/mmgr/Makefile | 2 +-
> src/backend/utils/mmgr/slab.c | 803 ++++++++++++++++++++++++
> src/include/nodes/memnodes.h | 2 +-
> src/include/nodes/nodes.h | 1 +
> src/include/replication/reorderbuffer.h | 15 +-
> src/include/utils/memutils.h | 9 +

I'd like to see the reorderbuffer changes split into a separate commit
from the slab allocator introduction.

> +/*-------------------------------------------------------------------------
> + *
> + * slab.c
> + * SLAB allocator definitions.
> + *
> + * SLAB is a custom MemoryContext implementation designed for cases of
> + * equally-sized objects.
> + *
> + *
> + * Portions Copyright (c) 2016, PostgreSQL Global Development Group

Bump, before a committer forgets it.

> + * IDENTIFICATION
> + * src/backend/utils/mmgr/slab.c
> + *
> + *
> + * The constant allocation size allows significant simplification and various
> + * optimizations. Firstly, we can get rid of the doubling and carve the blocks
> + * into chunks of exactly the right size (plus alignment), not wasting memory.

Getting rid of it relative to what? I'd try to phrase it so these
comments stand on their own.

> + * Each block includes a simple bitmap tracking which chunks are used/free.
> + * This makes it trivial to check if all chunks on the block are free, and
> + * eventually free the whole block (which is almost impossible with a global
> + * freelist of chunks, storing chunks from all blocks).

Why is checking a potentially somewhat long-ish bitmap better than a
simple counter, or a "linked list" of "next free chunk-number" or such
(where free chunks simply contain the id of the subsequent chunk)?
Using a list instead of a bitmap would also make it possible to get
'lifo' behaviour, which is good for cache efficiency. A simple
chunk-number based singly linked list would only imply a minimum
allocation size of 4 - that seems perfectly reasonable?

> + * At the context level, we use 'freelist' to track blocks ordered by number
> + * of free chunks, starting with blocks having a single allocated chunk, and
> + * with completely full blocks on the tail.

Why that way round? Filling chunks up as much as possible is good for
cache and TLB efficiency, and allows for earlier de-allocation of
partially used blocks? Oh, I see you do that in the next comment,
but it still leaves me wondering.

Also, is this actually a list? It's more an array of lists, right?
I.e. it should be named freelists?

Thirdly, isn't that approach going to result in a quite long freelists
array, when you have small items and a decent blocksize? That seems like
a fairly reasonable thing to do?

> + * This also allows various optimizations - for example when searching for
> + * free chunk, we the allocator reuses space from the most full blocks first,
> + * in the hope that some of the less full blocks will get completely empty
> + * (and returned back to the OS).

Might be worth mentioning tlb/cache efficiency too.

> + * For each block, we maintain pointer to the first free chunk - this is quite
> + * cheap and allows us to skip all the preceding used chunks, eliminating
> + * a significant number of lookups in many common usage patters. In the worst
> + * case this performs as if the pointer was not maintained.

Hm, so that'd be eliminated if we maintained a linked list of chunks (by
chunk number) and a free_chunk_cnt or such.

> +
> +#include "postgres.h"
> +
> +#include "utils/memdebug.h"
> +#include "utils/memutils.h"
> +#include "lib/ilist.h"

Move ilist up, above memdebug, so the list is alphabetically ordered.

> +/*
> + * SlabPointer
> + * Aligned pointer which may be a member of an allocation set.
> + */
> +typedef void *SlabPointer;
> +typedef SlabContext *Slab;

I personally wont commit this whith pointer hiding typedefs. If
somebody else does, I can live with it, but for me it's bad enough taste
that I wont.

> +/*
> + * SlabContext is a specialized implementation of MemoryContext.
> + */
> +typedef struct SlabContext
> +{
> + MemoryContextData header; /* Standard memory-context fields */
> + /* Allocation parameters for this context: */
> + Size chunkSize; /* chunk size */
> + Size fullChunkSize; /* chunk size including header and alignment */
> + Size blockSize; /* block size */
> + int chunksPerBlock; /* number of chunks per block */
> + int minFreeChunks; /* min number of free chunks in any block */
> + int nblocks; /* number of blocks allocated */
> + /* blocks with free space, grouped by number of free chunks: */
> + dlist_head freelist[FLEXIBLE_ARRAY_MEMBER];
> +} SlabContext;
> +

Why aren't these ints something unsigned?

> +/*
> + * SlabIsValid
> + * True iff set is valid allocation set.
> + */
> +#define SlabIsValid(set) PointerIsValid(set)

It's not your fault, but this "iff" is obviously a lot stronger than the
actual test ;). I seriously doubt this macro is worth anything...

> +/*
> + * SlabReset
> + * Frees all memory which is allocated in the given set.
> + *
> + * The code simply frees all the blocks in the context - we don't keep any
> + * keeper blocks or anything like that.
> + */

Why don't we? Seems quite worthwhile? Thinking about this, won't this
result in a drastic increase of system malloc/mmap/brk traffic when
there's lot of short transactions in reorderbuffer?

> +static void
> +SlabReset(MemoryContext context)
> +{
> + /* walk over freelists and free the blocks */
> + for (i = 0; i <= set->chunksPerBlock; i++)
> + {
> + dlist_mutable_iter miter;
> +
> + dlist_foreach_modify(miter, &set->freelist[i])
> + {
> + SlabBlock block = dlist_container(SlabBlockData, node, miter.cur);
> +
> + dlist_delete(miter.cur);
> +
> + /* Normal case, release the block */

What does "normal case" refer to here? Given that there's no alternative
case...

> + /*
> + * We need to update index of the next free chunk on the block. If we used
> + * the last free chunk on this block, set it to chunksPerBlock (which is
> + * not a valid chunk index). Otherwise look for the next chunk - we know
> + * that it has to be above the current firstFreeChunk value, thanks to how
> + * we maintain firstFreeChunk here and in SlabFree().
> + */
> + if (block->nfree == 0)
> + block->firstFreeChunk = set->chunksPerBlock;
> + else
> + {
> + /* look for the next free chunk in the block, after the first one */
> + while ((++block->firstFreeChunk) < set->chunksPerBlock)
> + {
> + int byte = block->firstFreeChunk / 8;
> + int bit = block->firstFreeChunk % 8;
> +
> + /* stop when you find 0 (unused chunk) */
> + if (!(block->bitmapptr[byte] & (0x01 << bit)))
> + break;
> + }
> +
> + /* must have found the free chunk */
> + Assert(block->firstFreeChunk != set->chunksPerBlock);
> + }

This and previous code just re-affirms my opinion that a bitmap is not
the best structure here.

> + /* move the whole block to the right place in the freelist */
> + dlist_delete(&block->node);
> + dlist_push_head(&set->freelist[block->nfree], &block->node);

Hm. What if we, instead of the array of doubly linked lists, just kept
a single linked list of blocks, and keep that list sorted by number of
free chunks? Given that freeing / allocation never changes the number
of allocated chunks by more than 1, we'll never have to move an entry
far in that list to keep it sorted.

> +/*
> + * SlabRealloc
> + * As Slab is designed for allocating equally-sized chunks of memory, it
> + * can't really do an actual realloc.
> + *
> + * We try to be gentle and allow calls with exactly the same size as in that
> + * case we can simply return the same chunk. When the size differs, we fail
> + * with assert failure or return NULL.
> + *
> + * We might be even support cases with (size < chunkSize). That however seems
> + * rather pointless - Slab is meant for chunks of constant size, and moreover
> + * realloc is usually used to enlarge the chunk.
> + *
> + * XXX Perhaps we should not be gentle at all and simply fails in all cases,
> + * to eliminate the (mostly pointless) uncertainty.

I think I'm in favor of that. This seems more likely to hide a bug than
actually helpful.

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-09 21:38:15 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Previous Message Peter Geoghegan 2017-02-09 21:29:42 Re: amcheck (B-Tree integrity checking tool)