Re: EXPERIMENTAL: mmap-based memory context / allocator

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: EXPERIMENTAL: mmap-based memory context / allocator
Date: 2015-02-15 21:13:55
Message-ID: 54E10C13.9070403@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.2.2015 21:38, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On 2015-02-15 21:07:13 +0100, Tomas Vondra wrote:
>>> On 15.2.2015 20:56, Heikki Linnakangas wrote:
>>>> glibc's malloc() also uses mmap() for larger allocations. Precisely
>>>> because those allocations can then be handed back to the OS. I don't
>>>> think we'd want to use mmap() for small allocations either. Let's not
>>>> re-invent malloc()..
>
>>> malloc() does that only for allocations over MAP_THRESHOLD, which is
>>> 128kB by default. Vast majority of blocks we allocate are <= 8kB, so
>>> mmap() almost never happens.
>
>> The problem is that mmap() is, to my knowledge, noticeably more
>> expensive than sbrk(). Especially with concurrent workloads. Which is
>> why the malloc/libc authors chose to use sbrk...
>
>> IIRC glibc malloc also batches several allocation into mmap()ed
>> areas after some time.
>
> Keep in mind also that aset.c doubles the request size every time it
> goes back to malloc() for some more space for a given context. So you
> get up to 128kB pretty quickly.

That's true, so for sufficiently large contexts we're already using
mmap() indirectly, through libc. Some contexts use just 8kB
(ALLOCSET_SMALL_MAXSIZE), but that's just a minority.

> There will be a population of 8K-to-64K chunks that don't ever get
> returned to the OS but float back and forth between different
> MemoryContexts as those are created and deleted. I'm inclined to
> think this is fine and we don't need to improve on it.

Sure, but there are scenarios where that can't happen, because the
contexts are created 'concurrently' so the blocks can't float between
the contexts.

And example that comes to mind is array_agg() with many groups, which is
made worse by allocating the MemoryContext data in TopMemoryContext,
creating 'islands' and making it impossible to release the memory.

http://www.postgresql.org/message-id/e010519fbe83b1331ee0dfcb122a616a@fuzzy.cz

> Part of the reason for my optimism is that on glibc-based platforms,
> IME PG backends do pretty well at reducing their memory consumption
> back down to a minimal value after each query. (On other platforms,
> not so much, but arguably that's libc's fault not ours.) So I'm not
> really seeing a problem that needs fixed, and definitely not one
> that a platform-specific fix will do much for.

I certainly agree this is not something we need to fix ASAP, and that
bypassing the libc may not be the right remedy. That's why I posted it
just here (and not to the CF), and marked it as experimental.

That however does not mean we can't improve this somehow - from time to
time I have to deal with machines where the minimum amount of memory
assigned to a process grew over time, gradually increased memory
pressure and eventually causing trouble. There are ways to fix this
(e.g. by reopening the connections, thus creating a new backend).

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Emre Hasegeli 2015-02-15 21:25:32 Re: Selectivity estimation for inet operators
Previous Message Tom Lane 2015-02-15 20:38:46 Re: EXPERIMENTAL: mmap-based memory context / allocator