Re: Hash Join cost estimates

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash Join cost estimates
Date: 2013-04-04 19:25:47
Message-ID: 20130404192547.GK4361@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> perf or oprofile reveal anything?

Here's what we get from oprofile (perhaps not too surprising):

Hash the small table / scan the big table:
samples cum. samples % cum. % linenr info image name symbol name
167374 167374 47.9491 47.9491 nodeHash.c:915 postgres ExecScanHashBucket
85041 252415 24.3624 72.3115 mcount.c:60 libc-2.15.so __mcount_internal
28370 280785 8.1274 80.4389 _mcount.S:33 libc-2.15.so mcount
15856 296641 4.5424 84.9814 (no location information) [vdso] (tgid:30643 range:0x7fffe6fff000-0x7fffe6ffffff) [vdso] (tgid:30643 range:0x7fffe6fff000-0x7fffe6ffffff)
6291 302932 1.8022 86.7836 xact.c:682 postgres TransactionIdIsCurrentTransactionId
4555 307487 1.3049 88.0885 instrument.c:70 postgres InstrStopNode
3849 311336 1.1027 89.1912 heapam.c:711 postgres heapgettup_pagemode
3567 314903 1.0219 90.2130 nodeHashjoin.c:63 postgres ExecHashJoin

Hash the big table / scan the small table:
samples cum. samples % cum. % linenr info image name symbol name
112060 112060 39.2123 39.2123 mcount.c:60 libc-2.15.so __mcount_internal
36547 148607 12.7886 52.0009 nodeHash.c:709 postgres ExecHashTableInsert
33570 182177 11.7469 63.7477 _mcount.S:33 libc-2.15.so mcount
16383 198560 5.7328 69.4805 (no location information) [vdso] (tgid:30643 range:0x7fffe6fff000-0x7fffe6ffffff) [vdso] (tgid:30643 range:0x7fffe6fff000-0x7fffe6ffffff)
13200 211760 4.6190 74.0995 (no location information) no-vmlinux /no-vmlinux
6345 218105 2.2203 76.3197 xact.c:682 postgres TransactionIdIsCurrentTransactionId
5250 223355 1.8371 78.1568 nodeHash.c:915 postgres ExecScanHashBucket
4797 228152 1.6786 79.8354 heapam.c:711 postgres heapgettup_pagemode
4661 232813 1.6310 81.4664 aset.c:563 postgres AllocSetAlloc
4588 237401 1.6054 83.0718 instrument.c:70 postgres InstrStopNode
3550 240951 1.2422 84.3140 memcpy-ssse3-back.S:60 libc-2.15.so __memcpy_ssse3_back
3013 243964 1.0543 85.3684 aset.c:1109 postgres AllocSetCheck

Looking at the 'Hash the small table / scan the big table', opannotate
claims that this is bar far the worst offender:

147588 42.2808 : hashTuple = hashTuple->next;

While most of the time in the 'Hash the big table / scan the small
table' is in:

34572 12.0975 : hashTuple->next = hashtable->buckets[bucketno];

Neither of those strike me as terribly informative though. To be
honest, I've not really played w/ oprofile all that much. Now that I've
got things set up to support this, I'd be happy to provide more info if
anyone has suggestions on how to get something more useful.

It does look like reducing bucket depth, as I outlined before through
the use of a 2-level hashing system, might help speed up
ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
entries to consider instead of more. Along those same lines, I really
wonder if we're being too generous wrt the bucket-depth goal of '10'
instead of, say, '1', especially when we've got plenty of work_mem
available.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-04-04 19:32:23 Re: Hash Join cost estimates
Previous Message Dimitri Fontaine 2013-04-04 18:53:18 Re: Multi-pass planner