Re: Hash Join cost estimates

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash Join cost estimates
Date: 2013-04-04 20:16:12
Message-ID: 20130404201612.GM4361@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Stephen Frost (sfrost(at)snowman(dot)net) wrote:
> It does look like reducing bucket depth, as I outlined before through
> the use of a 2-level hashing system, might help speed up
> ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
> entries to consider instead of more. Along those same lines, I really
> wonder if we're being too generous wrt the bucket-depth goal of '10'
> instead of, say, '1', especially when we've got plenty of work_mem
> available.

Rerunning using a minimally configured build (only --enable-openssl
and --enable-debug passed to configure) with NTUP_PER_BUCKET set to '1'
results in a couple of interesting things-

First, the planner actually picks the plan to hash the small table and
seqscan the big one. That also, finally, turns out to be *faster* for
this test case.

explain analyze results here:
Hash small table / seqscan big table: http://explain.depesz.com/s/nP1
Hash big table / seqscan small table: http://explain.depesz.com/s/AUv

Here's the oprofile reports:

Hash small table / seqscan big table:
samples cum. samples % cum. % linenr info image name symbol name
39023 39023 52.8574 52.8574 nodeHash.c:915 postgres ExecScanHashBucket
3743 42766 5.0700 57.9273 xact.c:682 postgres TransactionIdIsCurrentTransactionId
3110 45876 4.2126 62.1399 nodeHashjoin.c:63 postgres ExecHashJoin
2561 48437 3.4689 65.6088 heapam.c:711 postgres heapgettup_pagemode
2427 50864 3.2874 68.8962 heapam.c:300 postgres heapgetpage
2395 53259 3.2441 72.1403 heaptuple.c:1028 postgres slot_deform_tuple
2395 55654 3.2441 75.3843 heaptuple.c:1135 postgres slot_getattr
2383 58037 3.2278 78.6122 nodeHash.c:786 postgres ExecHashGetHashValue
1811 59848 2.4530 81.0652 tqual.c:1044 postgres HeapTupleSatisfiesMVCC
1796 61644 2.4327 83.4979 execScan.c:111 postgres ExecScan
1298 62942 1.7582 85.2561 hashfunc.c:517 postgres hash_uint32
1274 64216 1.7257 86.9817 execProcnode.c:356 postgres ExecProcNode
1011 65227 1.3694 88.3511 heapam.c:1453 postgres heap_getnext
905 66132 1.2258 89.5770 execTuples.c:333 postgres ExecStoreTuple
858 66990 1.1622 90.7392 fmgr.c:1291 postgres FunctionCall1Coll
835 67825 1.1310 91.8702 execQual.c:668 postgres ExecEvalScalarVarFast
834 68659 1.1297 92.9999 mcxt.c:126 postgres MemoryContextReset
818 69477 1.1080 94.1078 nodeSeqscan.c:48 postgres SeqNext

Hash big table / seqscan small table:
samples cum. samples % cum. % linenr info image name symbol name
38612 38612 41.2901 41.2901 nodeHash.c:709 postgres ExecHashTableInsert
7435 46047 7.9507 49.2408 (no location information) no-vmlinux /no-vmlinux
4900 50947 5.2399 54.4806 aset.c:563 postgres AllocSetAlloc
3803 54750 4.0668 58.5474 xact.c:682 postgres TransactionIdIsCurrentTransactionId
3335 58085 3.5663 62.1137 heapam.c:711 postgres heapgettup_pagemode
2532 60617 2.7076 64.8213 nodeHash.c:786 postgres ExecHashGetHashValue
2523 63140 2.6980 67.5193 memcpy-ssse3-back.S:60 libc-2.15.so __memcpy_ssse3_back
2518 65658 2.6926 70.2119 heaptuple.c:1028 postgres slot_deform_tuple
2378 68036 2.5429 72.7549 heapam.c:300 postgres heapgetpage
2374 70410 2.5387 75.2935 heaptuple.c:1135 postgres slot_getattr
1852 72262 1.9805 77.2740 nodeHash.c:915 postgres ExecScanHashBucket
1831 74093 1.9580 79.2320 tqual.c:1044 postgres HeapTupleSatisfiesMVCC
1732 75825 1.8521 81.0841 heapam.c:1453 postgres heap_getnext
1320 77145 1.4116 82.4957 nodeHash.c:76 postgres MultiExecHash
1219 78364 1.3035 83.7992 heaptuple.c:1529 postgres minimal_tuple_from_heap_tuple
1212 79576 1.2961 85.0953 execProcnode.c:356 postgres ExecProcNode
1209 80785 1.2929 86.3881 hashfunc.c:517 postgres hash_uint32
1197 81982 1.2800 87.6682 execScan.c:111 postgres ExecScan
1139 83121 1.2180 88.8862 execTuples.c:333 postgres ExecStoreTuple
1010 84131 1.0801 89.9662 execTuples.c:662 postgres ExecFetchSlotMinimalTuple
961 85092 1.0277 90.9939 aset.c:821 postgres AllocSetFree

Looking with opannotate, there's two main hotspots in
ExecScanHashBucket:

12846 17.4001 : hashTuple = hashtable->buckets[hjstate->hj_CurBucketNo];

and

22100 29.9348 : hashTuple = hashTuple->next;

I'm certainly curious about those, but I'm also very interested in the
possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable
depending on the work_mem setting. It's only used in
ExecChooseHashTableSize, so while making it variable or depending on
work_mem could slow planning down a bit, it's not a per-tuple cost item.

Thoughts?

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-04-04 20:28:57 Re: Multi-pass planner
Previous Message Jeff Davis 2013-04-04 19:59:36 Re: corrupt pages detected by enabling checksums