Re: [POC] hash partitioning

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: amul sul <sulamul(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-10-12 19:43:53
Message-ID: 20171012194353.3nealiykmjura4bi@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-10-12 10:05:26 -0400, Robert Haas wrote:
> On Thu, Oct 12, 2017 at 9:08 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
> > How about combining high 32 bits and the low 32 bits separately as shown below?
> >
> > static inline uint64
> > hash_combine64(uint64 a, uint64 b)
> > {
> > return (((uint64) hash_combine((uint32) a >> 32, (uint32) b >> 32) << 32)
> > | hash_combine((unit32) a, (unit32) b));
> > }
>
> I doubt that's the best approach, but I don't have something specific
> to recommend.

Yea, that doesn't look great. There's basically no intermixing between
low and high 32 bits. going on. We probably should just expand the
concept of the 32 bit function:

static inline uint32
hash_combine32(uint32 a, uint32 b)
{
/* 0x9e3779b9 is the golden ratio reciprocal */
a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2);
return a;
}

to something roughly like:

static inline uint64
hash_combine64(uint64 a, uint64 b)
{
/* 0x49A0F4DD15E5A8E3 is 64bit random data */
a ^= b + 0x49A0F4DD15E5A8E3 + (a << 54) + (a >> 7);
return a;
}

In contrast to the 32 bit version's fancy use of the golden ratio
reciprocal as a constant I went brute force, and just used 64bit of
/dev/random. From my understanding the important property is that bits
are independent from each other, nothing else.

The shift widths are fairly random, but they should bring in enough bit
perturbation when mixing in only 32bit of hash value (i.e
0x00000000xxxxxxxx).

Are we going to rely on the the combine function to stay the same
forever after?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-10-12 19:50:51 Re: oversight in EphemeralNamedRelation support
Previous Message Gourav Kumar 2017-10-12 19:30:30 Re: How does postgres store the join predicate for a relation in a given query