Re: PATCH: two slab-like memory allocators

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: two slab-like memory allocators
Date: 2017-02-14 19:05:20
Message-ID: 19464ac8-e66b-3598-858f-29c749bec35a@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/14/2017 03:22 AM, Andres Freund wrote:
> Hi,
>
> On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote:
>> On 02/11/2017 02:33 AM, Andres Freund wrote:
>>>> I have a hard time believing this the cache efficiency of linked lists
>>>> (which may or may not be real in this case) out-weights this, but if you
>>>> want to try, be my guest.
>>>
>>> I'm not following - why would there be overhead in anything for
>>> allocations bigger than 4 (or maybe 8) bytes? You can store the list
>>> (via chunk ids, not pointers) inside the chunks itself, where data
>>> otherwise would be. And I don't see why you'd need a doubly linked
>>> list, as the only operations that are needed are to push to the front of
>>> the list, and to pop from the front of the list - and both operations
>>> are simple to do with a singly linked list?
>>>
>>
>> Oh! I have not considered storing the chunk indexes (for linked lists) in
>> the chunk itself, which obviously eliminates the overhead concerns, and
>> you're right a singly-linked list should be enough.
>>
>> There will be some minimum-chunk-size requirement, depending on the block
>> size/chunk size. I wonder whether it makes sense to try to be smart and make
>> it dynamic, so that we only require 1B or 2B for cases when only that many
>> chunks fit into a block, or just say that it's 4B and be done with it.
>
> I doubt it's worth it - it seems likely that the added branch is more
> noticeable overall than the possible savings of 3 bytes. Also, won't the
> space be lost due to alignment *anyway*?
> + /* chunk, including SLAB header (both addresses nicely aligned) */
> + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));
>
> In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and
> use a plain slist - no point in being more careful than that.
>

Hmm, I think you're right.

>
>> I mean 2^32 chunks ought to be enough for anyone, right?
>
> Yea, that seems enough; but given the alignment thing pointed out above,
> I think we can just use plain pointers - and that definitely should be
> enough :P
>

People in year 2078: Why the hell did they only use 32 bits? Wasn't it
obvious we'll have tiny computers with 32EB of RAM? ;-)

>
>> I'm still not buying the cache efficiency argument, though. One of
>> the reasons is that the implementation prefers blocks with fewer
>> free chunks when handling palloc(), so pfree() is making the block
>> less likely to be chosen by the next palloc().
>
> That'll possibly de-optimize L1, but for L2 usage the higher density
> seems like it'll be a win. All this memory is only accessed by a
> single backend, so packing as densely as possible is good.
>
>
>>> If so, if you have e.g. 8 byte allocations and 64kb sized blocks,
>>> you end up with an array of 1024 doubly linked lists, which'll
>>> take up 64kb on its own. And there a certainly scenarios where
>>> even bigger block sizes could make sense. That's both memory
>>> overhead, and runtime overhead, because at reset-time we'll have
>>> to check the whole array (which'll presumably largely be empty
>>> lists). Resetting is a pretty common path...
>>>
>>
>> True, but it's not entirely clear if resetting is common for the
paths where we use those new allocators.
>
> That's fair enough. There's still the memory overhead, but I guess
> we can also live with that.
>

Right. My ambition was not to develop another general-purpose memory
context that would work perfectly for everything, but something that
works (better than the current code) for places like reorderbuffer.

>
>> Also, if we accept that it might be a problem, what other solution you
>> propose? I don't think just merging everything into a single list is a good
>> idea, for the reasons I explained before (it might make the resets somewhat
>> less expensive, but it'll make pfree() more expensive).
>>
>
> Now that I think about it, a binary heap, as suggested elsewhere, isn't
> entirely trivial to use for this - it's more or less trivial to "fix"
> the heap after changing an element's value, but it's harder to find that
> element first.
>
> But a two-level list approach seems like it could work quite well -
> basically a simplified skip-list. A top-level list contains that has
> the property that all the elements have a distinct #free, and then
> hanging of those sub-lists for all the other blocks with the same number
> of chunks.
>
> I think that'd be a better implementation, but I can understand if you
> don't immediately want to go there.
>

I don't want to go there. I'm not all that interested in reorderbuffer,
to be honest, and this started more as "Hold my beer!" hack, after a
midnight discussion with Petr, than a seriously meant patch. I've
already spent like 100x time on it than I expected.

>
>> What might work is replacing the array with a list, though. So we'd have a
>> list of lists, which would eliminate the array overhead.
>
> That seems likely to be significantly worse, because a) iteration is
> more expensive b) accessing the relevant list to move between two
> different "freecount" lists would be O(n).
>

Oh, right, I haven't realized we won't know the current head of the
list, so we'd have to search for it. OTOH, we could replace it with a
small hash table, which would reduce the lookup time because we'd have
to search only in a single bin.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-02-14 19:18:54 Re: Improve OR conditions on joined columns (common star schema problem)
Previous Message Jeff Janes 2017-02-14 18:55:50 Re: renaming pg_resetxlog to pg_resetwal has broken pg_upgrade.