From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PATCH: two slab-like memory allocators |
Date: | 2017-02-14 19:05:20 |
Message-ID: | 19464ac8-e66b-3598-858f-29c749bec35a@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02/14/2017 03:22 AM, Andres Freund wrote:
> Hi,
>
> On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote:
>> On 02/11/2017 02:33 AM, Andres Freund wrote:
>>>> I have a hard time believing this the cache efficiency of linked lists
>>>> (which may or may not be real in this case) out-weights this, but if you
>>>> want to try, be my guest.
>>>
>>> I'm not following - why would there be overhead in anything for
>>> allocations bigger than 4 (or maybe 8) bytes? You can store the list
>>> (via chunk ids, not pointers) inside the chunks itself, where data
>>> otherwise would be. And I don't see why you'd need a doubly linked
>>> list, as the only operations that are needed are to push to the front of
>>> the list, and to pop from the front of the list - and both operations
>>> are simple to do with a singly linked list?
>>>
>>
>> Oh! I have not considered storing the chunk indexes (for linked lists) in
>> the chunk itself, which obviously eliminates the overhead concerns, and
>> you're right a singly-linked list should be enough.
>>
>> There will be some minimum-chunk-size requirement, depending on the block
>> size/chunk size. I wonder whether it makes sense to try to be smart and make
>> it dynamic, so that we only require 1B or 2B for cases when only that many
>> chunks fit into a block, or just say that it's 4B and be done with it.
>
> I doubt it's worth it - it seems likely that the added branch is more
> noticeable overall than the possible savings of 3 bytes. Also, won't the
> space be lost due to alignment *anyway*?
> + /* chunk, including SLAB header (both addresses nicely aligned) */
> + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));
>
> In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and
> use a plain slist - no point in being more careful than that.
>
Hmm, I think you're right.
>
>> I mean 2^32 chunks ought to be enough for anyone, right?
>
> Yea, that seems enough; but given the alignment thing pointed out above,
> I think we can just use plain pointers - and that definitely should be
> enough :P
>
People in year 2078: Why the hell did they only use 32 bits? Wasn't it
obvious we'll have tiny computers with 32EB of RAM? ;-)
>
>> I'm still not buying the cache efficiency argument, though. One of
>> the reasons is that the implementation prefers blocks with fewer
>> free chunks when handling palloc(), so pfree() is making the block
>> less likely to be chosen by the next palloc().
>
> That'll possibly de-optimize L1, but for L2 usage the higher density
> seems like it'll be a win. All this memory is only accessed by a
> single backend, so packing as densely as possible is good.
>
>
>>> If so, if you have e.g. 8 byte allocations and 64kb sized blocks,
>>> you end up with an array of 1024 doubly linked lists, which'll
>>> take up 64kb on its own. And there a certainly scenarios where
>>> even bigger block sizes could make sense. That's both memory
>>> overhead, and runtime overhead, because at reset-time we'll have
>>> to check the whole array (which'll presumably largely be empty
>>> lists). Resetting is a pretty common path...
>>>
>>
>> True, but it's not entirely clear if resetting is common for the
paths where we use those new allocators.
>
> That's fair enough. There's still the memory overhead, but I guess
> we can also live with that.
>
Right. My ambition was not to develop another general-purpose memory
context that would work perfectly for everything, but something that
works (better than the current code) for places like reorderbuffer.
>
>> Also, if we accept that it might be a problem, what other solution you
>> propose? I don't think just merging everything into a single list is a good
>> idea, for the reasons I explained before (it might make the resets somewhat
>> less expensive, but it'll make pfree() more expensive).
>>
>
> Now that I think about it, a binary heap, as suggested elsewhere, isn't
> entirely trivial to use for this - it's more or less trivial to "fix"
> the heap after changing an element's value, but it's harder to find that
> element first.
>
> But a two-level list approach seems like it could work quite well -
> basically a simplified skip-list. A top-level list contains that has
> the property that all the elements have a distinct #free, and then
> hanging of those sub-lists for all the other blocks with the same number
> of chunks.
>
> I think that'd be a better implementation, but I can understand if you
> don't immediately want to go there.
>
I don't want to go there. I'm not all that interested in reorderbuffer,
to be honest, and this started more as "Hold my beer!" hack, after a
midnight discussion with Petr, than a seriously meant patch. I've
already spent like 100x time on it than I expected.
>
>> What might work is replacing the array with a list, though. So we'd have a
>> list of lists, which would eliminate the array overhead.
>
> That seems likely to be significantly worse, because a) iteration is
> more expensive b) accessing the relevant list to move between two
> different "freecount" lists would be O(n).
>
Oh, right, I haven't realized we won't know the current head of the
list, so we'd have to search for it. OTOH, we could replace it with a
small hash table, which would reduce the lookup time because we'd have
to search only in a single bin.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-02-14 19:18:54 | Re: Improve OR conditions on joined columns (common star schema problem) |
Previous Message | Jeff Janes | 2017-02-14 18:55:50 | Re: renaming pg_resetxlog to pg_resetwal has broken pg_upgrade. |