PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching
Date: 2014-12-07 03:08:28
Message-ID: 5483C4AC.5060107@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

back when we were discussing the hashjoin patches (now committed),
Robert proposed that maybe it'd be a good idea to sometimes increase the
number of tuples per bucket instead of batching.

That is, while initially sizing the hash table - if the hash table with
enough buckets to satisfy NTUP_PER_BUCKET load factor does not fit into
work_mem, try a bit higher load factor before starting to batch.

Attached patch is an initial attempt to implement this - it's a bit
rough on the edges, but hopefully enough to judge the benefit of this.

The default load factor is 1. The patch tries to degrade this to 2, 4 or
8 in attempt to fit the hash table into work mem. If it doesn't work, it
starts batching with the default load factor. If the batching is
required while the hashjoin is running, the load factor is switched back
to the default one (if we're batching, there's no point in keeping the
slower hash table).

The patch also modifies explain output, to show the load factor.

The testing I've done so far are rather disappointing, though.

create table a as select i from generate_series(1,1000000) s(i);
create table b as select i from generate_series(1,1000000) s(i);

analyze a;
analyze b;

select a.i, b.i from a join b on (a.i = b.i);

work_mem batches tuples per bucket duration
-----------------------------------------------------
64 MB 1 1 585 ms
46 MB 1 2 639 ms
43 MB 1 4 794 ms
40 MB 1 8 1075 ms
39 MB 2 1 623 ms

So, even increasing the load factor to 2 is slower than batching.

Of course, with other examples the results may be different. For example
with a much larger outer table:

create table a as select mod(i,1000000) i
from generate_series(1,10000000) s(i);
analyze a;

work_mem batches tuples per bucket duration
-----------------------------------------------------
64 MB 1 1 3904 ms
46 MB 1 2 4692 ms
43 MB 1 4 6499 ms
40 MB 1 8 9264 ms
39 MB 2 1 4054 ms

Same results. Of course, for "huge" outer tables it will make a
difference. For example on a ~8GB outer table (on a machine with 8GB of
RAM), the batching causes enough I/O to make it slower than ntup=2 (50s
vs. 80s, for example).

I'm not sure what's the best way forward - clearly, for cases that fit
into RAM (temp files into page cache), batching is faster. For tables
large enough to cause a lot of I/O, it may make a difference - but I'm
not sure how to identify these cases.

So ISTM implementing this requires a reliable way to identify which case
we're dealing with - if the outer table is large enough (prefer
increasing load factor) or not (prefer batching). Until then keeping the
current simple/predictible approach is probably better.

Of course, this also depends on additional variables (e.g. is the system
memory-stressed?).

regards
Tomas

Attachment Content-Type Size
hashjoin-graceful-v1.patch text/x-diff 10.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-12-07 03:43:46 Re: Testing DDL deparsing support
Previous Message Tomas Vondra 2014-12-07 01:54:14 PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3)