Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date: 2014-12-20 20:13:37
Message-ID: 5495D871.5060509@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/20/14, 11:51 AM, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On 2014-12-19 22:03:55 -0600, Jim Nasby wrote:
>>> What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogous to:
>>> hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum)
>>>
>>> perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something?
>
>> I don't think that'd improve anything. Jenkin's hash does have a quite
>> mixing properties, I don't believe that the above would improve the
>> quality of the hash.
>
> I think what Jim is suggesting is to intentionally degrade the quality of
> the hash in order to let it be calculated a tad faster. We could do that
> but I doubt it would be a win, especially in systems with lots of buffers.
> IIRC, when we put in Jenkins hashing to replace the older homebrew hash
> function, it improved performance even though the hash itself was slower.

Right. Now that you mention it, I vaguely recall the discussions about changing the hash function to reduce collisions.

I'll still take a look at fash-hash, but it's looking like there may not be anything we can do here unless we change how we identify relation files (combining dbid, tablespace id, fork number and file id, at least for searching). If we had 64bit hash support then maybe that'd be a significant win, since you wouldn't need to hash at all. But that certainly doesn't seem to be low-hanging fruit to me...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-12-20 20:50:09 Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Previous Message Tomas Vondra 2014-12-20 19:29:31 Re: PATCH: decreasing memory needlessly consumed by array_agg