Re: tweaking NTUP_PER_BUCKET

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tweaking NTUP_PER_BUCKET
Date: 2014-07-13 19:32:28
Message-ID: 53C2DECC.2010408@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11.7.2014 19:25, Tomas Vondra wrote:
> 2) walking through the tuples sequentially
> ------------------------------------------
>
> The other option is not to walk the tuples through buckets, but by
> walking throught the chunks - we know the tuples are stored as
> HashJoinTuple/MinimalTuple, so it shouldn't be that difficult. And by
> walking the tuples in the order they're stored, we may just rearrange
> the tuples in already allocated memory. And maybe it could be even
> faster than the current code, because of the sequential access to the
> memory (as opposed to the random access through buckets) and CPU caches.
> So I like this approach - it's simple, clean and shouldn't add any
> overhead (memory or CPU).

So, attached is a patch that implements this. It's not very complicated
and the resulting code is surprisingly readable (IMHO comprable to the
previous code).

Basics of the patch:

(1) adds HashChunkData structure - a buffer used for dense allocation
of tuples, 16kB by default

(2) adds 'chunks_batch' into HashJoinTableData, which is a linked list
of HashChunkData

(3) the allocation itself is done in chunk_alloc - in short it either
allocates the tuple in the current chunk, or adds a new one if
needed

(4) ExecHashTableInsert calls chunk_alloc (instead of the regular
MemoryContextAlloc)

(5) the main change is in ExecHashIncreaseNumBatches, which now can't
do pfree (does not work with dense allocation), so it reads the
tuples from chunks directly and simply rebuilds the buckets
from scratch (thanks to this we only need one more chunk of memory,
not a complete copy, and we don't need to duplicate buckets at all)

From the tests I've done so far this seems to work perfectly - it still
saves the memory overhead, and I've seen no difference in performance.

The current patch only implemnents this for tuples in the main hash
table, not for skew buckets. I plan to do that, but it will require
separate chunks for each skew bucket (so we can remove it without
messing with all of them). The chunks for skew buckets should be smaller
(I'm thinking about ~4kB), because there'll be more of them, but OTOH
those buckets are for frequent values so the chunks should not remain empty.

But let's see if the current patch misses something important first.

regards
Tomas

Attachment Content-Type Size
hashjoin-alloc-v3.patch text/x-diff 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2014-07-13 20:32:27 Re: SSL information view
Previous Message Andres Freund 2014-07-13 19:20:23 Re: better atomics - v0.5