Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-18 23:00:08
Message-ID: 54935C78.8020709@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/18/14, 12:48 PM, Tom Lane wrote:
> I wrote:
>> Here's a proposed patch along this line. I left in oid_hash (in the
>> form of a macro) so that this does not cause any API break for existing
>> third-party modules. However, no callers in our own code directly
>> refer to tag_hash or oid_hash anymore.
>
> Committed that version after some further comment wordsmithing.
>
> On Teodor's original test cases, I see about 8% speedup compared to
> the 4%-ish numbers he originally reported. This may be random variation
> or it might mean that we got a win someplace else besides tidbitmap.c.
> I've not tried to sleuth it down exactly. I am wondering though if
> this suggests that it'd be worth our while to add a similar fast path
> for 8-byte hash keys. That would be quite painless to add now (with
> the exception of actually coding the fast hash function, perhaps).

I stuck an elog in to report when a hash wasn't found, and came up with these results (first number is how many times a hash wasn't found, second is keysize). I don't see a lot of 8's in there...

Second set of numbers is the same thing, except counting every time we looked for the hash value, regardless of result. (Both cases generated by running make check.)

BTW, as part of this I found some hash values were hit over 30k times in the first case. I don't know if that means a boatload of collisions or something else. Command to generate that data is: $egrep '^LOG: Hashtable ' src/test/regress/log/postmaster.log |cut -d\| -f2,4|sort|uniq -c|cut -d' ' -f1|sort -n|tail

Before hitting the raw data, here is a summary of hash searches by key size (generated by patch 1):

43 48
327 256
1411 416
9321 136
12436 12
31270 64
107017 8
203127 16
628107 4
2201582 20 -- Mostly LOCALLOCK and Shared Buffer

egrep '^LOG: Hashtable ' src/test/regress/log/postmaster.log |cut -d\| -f2,6|sort|uniq -c (patch 2)
581 Analyzed elements table|8
1141 Analyzed lexemes table|16
46 Array distinct element count table|4
167 Attopt cache|8
26 Btree proof lookup cache|8
95 CFuncHash|4
2387 Combo CIDs|8
99 Databases hash|4
22 Event Trigger Cache|4
85 JoinRelHashTable|8
460690 LOCALLOCK hash|20
1 LOCALLOCK hash|LOCALLOCK hash
49608 LOCK hash|16
951 Local Buffer Lookup Table|20
5 Local predicate lock|16
581 Operator class cache|4
1658 Operator lookup cache|136
301 PLpgSQL function cache|416
5 PREDICATELOCK hash|16
14 PREDICATELOCKTARGET hash|16
52278 PROCLOCK hash|16
1064 Pending Ops Table|12
14790 Per-database table|4
15297 Portal hash|64
21 Prepared Queries|64
126 PrivateRefCount|4
4 RI compare cache|8
55 RI constraint cache|4
90 RI query cache|8
72 Record information cache|64
24040 Relcache by OID|4
742 RelfilenodeMap cache|8
21 Rendezvous variable hash|64
4 Rewrite / Old to new tid map|12
49 Rewrite / Unresolved ctids|12
5 SERIALIZABLEXID hash|4
60 Sequence values|4
9603 Shared Buffer Lookup Table|20
43 ShmemIndex|48
5335 TIDBitmap|4
138 TableSpace cache|4
16516 Temporary table of OIDs|4
169 Timezones|256
6 Tsearch configuration cache|4
4 Tsearch dictionary cache|4
1 Tsearch parser cache|4
11550 TupleHashTable|8
327 Type information cache|4
59 json object hashtable|64
17430 smgr relation table|16

egrep '^LOG: Hashtable ' src/test/regress/log/postmaster.log |cut -d\| -f2,6|sort|uniq -c (patch 1)
2250 Analyzed elements table|8
28817 Analyzed lexemes table|16
428 Array distinct element count table|4
447 Attopt cache|8
224 Btree proof lookup cache|8
1363 CFuncHash|4
5219 Combo CIDs|8
2415 Databases hash|4
12063 Event Trigger Cache|4
178 JoinRelHashTable|8
990753 LOCALLOCK hash|20
72129 LOCK hash|16
19858 Local Buffer Lookup Table|20
2 Local Buffer Lookup Table|LOG: Hashtable
7 Local predicate lock|16
4557 Operator class cache|4
9321 Operator lookup cache|136
1 Operator lookup cache|136LOG: Hashtable
1411 PLpgSQL function cache|416
8 PREDICATELOCK hash|16
106 PREDICATELOCKTARGET hash|16
73102 PROCLOCK hash|16
10929 Pending Ops Table|12
23534 Per-database table|4
29502 Portal hash|64
233 Prepared Queries|64
689 PrivateRefCount|4
91 RI compare cache|8
310 RI constraint cache|4
289 RI query cache|8
1471 Record information cache|64
1 Record information cache|64LOG: Hashtable
1 Record information cache|6LOG: Hashtable
426050 Relcache by OID|4
1308 RelfilenodeMap cache|8
12 Rendezvous variable hash|64
24 Rewrite / Old to new tid map|12
1483 Rewrite / Unresolved ctids|12
12 SERIALIZABLEXID hash|4
263 Sequence values|4
1190971 Shared Buffer Lookup Table|20
30 Shared Buffer Lookup Table|20LOG: Hashtable
20 Shared Buffer Lookup Table|2LOG: Hashtable
1 Shared Buffer Lookup Table|Event Trigger Cache
27 Shared Buffer Lookup Table|LOCALLOCK hash
2 Shared Buffer Lookup Table|LOCK hash
14 Shared Buffer Lookup Table|LOG: Hashtable
9 Shared Buffer Lookup Table|PROCLOCK hash
10 Shared Buffer Lookup Table|Relcache by OID
20 Shared Buffer Lookup Table|Shared Buffer Lookup Table
3 Shared Buffer Lookup Table|TIDBitmap
1 Shared Buffer Lookup Table|smgr relation table
43 ShmemIndex|48
104683 TIDBitmap|4
17 TOAST to main relid map|4
19542 TableSpace cache|4
14058 Temporary table of OIDs|4
327 Timezones|256
5 Tsearch configuration cache|4
68 Tsearch dictionary cache|4
3 Tsearch parser cache|4
1 TupleHashTable| hash
97011 TupleHashTable|8
18045 Type information cache|4
2 db hash|4
52 json object hashtable|64
28958 smgr relation table|16

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachment Content-Type Size
1.patch text/plain 452 bytes
2.patch text/plain 558 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-12-18 23:05:01 Re: pg_regress writes into source tree
Previous Message Jim Nasby 2014-12-18 22:05:23 Re: Proposal: Log inability to lock pages during vacuum