sb_alloc: a new memory allocator for PostgreSQL

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: sb_alloc: a new memory allocator for PostgreSQL
Date: 2014-05-06 13:04:52
Message-ID: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over the last several months, I've been working on a new memory
allocator for PostgreSQL. While it's not done, and there are
problems, I think I've reached the point where it makes sense to get
this out in front of a wider audience and get some feedback,
preferably of the sort that doesn't do any permanent damage. I
started out down this path because parallel sort will really be much
happier if it can allocate from either dynamic shared memory or
backend-local memory using the same interface. Arguably, we could
implement parallel internal sort using some kind of bespoke memory
management strategy, like packing all of the tuples into the memory
segment tightly starting from the end, and growing the tuple array
from the beginning, but then what happens if we decide to give up on
parallelism and do an external sort after all? Even if we could
engineer our way around that problem, I think there are bound to be
other things that people want to do with dynamic shared memory
segments where being able to easily allocate and free memory is
desirable.

As I got into the problem a little bit further, I realized that
AllocSetAlloc is actually pretty poor allocator for sorting. As far
as I can see, the basic goal of aset.c was to make context resets
cheap - which makes sense, because we have a lot of very short-lived
memory contexts. However, when you're sorting a lot of data, being
able to reset the context quickly is not as important as using memory
efficiently, and AllocSetAlloc is pretty terrible at that, especially
for small allocations. Each allocation is rounded up to a power of
two, and then we add 16 bytes of overhead on top of that (on 64-bit
systems), which means that palloc has 100% overhead for a large number
of 16-byte allocations, and 200% overhead for a large number of 8-byte
allocations. The 16-byte overhead doesn't hurt quite as much on big
allocations, but the rounding to a power of two can sometimes be very
bad. For example, for repeated 96-byte allocations, palloc has 50%
overhead. I decided I wanted to come up with something better.

I read through the literature and found that most modern allocators
seemed to use superblocks. A superblock is a chunk of memory, perhaps
64kB, which is carved up into a bunch of chunks of equal size. Those
chunks don't have individual headers; instead, the pointer address is
used to locate the metadata for the chunk. Requests are binned into
size classes, which are generally much more fine-grained than the
powers-of-two strategy used by AllocSetAlloc. Of course, too many
size classes can waste memory if you end up with many partially-full
superblocks, each for a different size class, but the consensus seems
to be that you make it up and more by wasting less memory within each
allocation (e.g. a 96 byte allocation can be exactly 96 bytes, rather
than 128 bytes).

I investigated whether there might be an existing allocator we could
use; I did not find anything that looked suitable. No allocator that
I ran across could handle allocation of shared memory; in fact, pretty
much all of them assumed the existence of an underlying "system
allocator" that they could use to allocate memory for their own
metadata. Those that included a system allocator (like tcmalloc)
still didn't support anything that looked like dynamic shared memory.
Most of them were also unsuitable for other reasons (e.g. tcmalloc is
written in C++). I also didn't find anything that looked like our
memory context paradigm, and in particular the ability to cheaply
reset a context, in any other allocator. So I decided to write my
own; a first cut is attached. It's not integrated with the
MemoryContext stuff yet, and very likely needs to be. There are some
functions for which I've added prototypes but not implementations; the
support for allocation from dynamic shared memory segments is just a
stub right now. And there are various other warts as well which will
need to be cleaned up in due time if this is to go anywhere.

Performance-wise, the raw allocation performance (sb_alloc) is
reasonably good. On PPC64, it lost out to palloc on 8-byte
allocations but was comparable for larger allocations; on my MacBook,
it was faster across the board. The raw deallocation performance
(sb_free) is appreciably slower than pfree; that can probably be
optimized somewhat. It's not really a fair comparison at this point,
though, among other reasons because palloc is jumping through an extra
layer of indirection. Leaving that problem to one side, I hacked up
index builds to use the new allocator and found that on REINDEX INDEX
pgbench_accounts_pkey at scale factor 100, this uses 22% less memory
than palloc, but runs about 4-7% slower on PPC64. It may be possible
to eliminate some of that with a tighter integration of the new
allocator and some more fine-tuning.

As you can probably tell, this is all somewhat preliminary and
experimental at this point, but thoughts welcome.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
sballoc-v1.patch text/x-patch 133.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-05-06 13:06:52 Re: Comment patch for index-only scans
Previous Message Stephen Frost 2014-05-06 13:03:36 Re: pgaudit - an auditing extension for PostgreSQL