Re: Hash Functions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: Hash Functions
Date: 2017-05-13 17:57:15
Message-ID: 13475.1494698235@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, May 13, 2017 at 12:52 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Can we think of defining separate portable hash functions which can be
>> used for the purpose of hash partitioning?

> I think that would be a good idea. I think it shouldn't even be that
> hard. By data type:

> - Integers. We'd need to make sure that we get the same results for
> the same value on big-endian and little-endian hardware, and that
> performance is good on both systems. That seems doable.

> - Floats. There may be different representations in use on different
> hardware, which could be a problem. Tom didn't answer my question
> about whether any even-vaguely-modern hardware is still using non-IEEE
> floats, which I suspect means that the answer is "no". If every bit
> of hardware we are likely to find uses basically the same
> representation of the same float value, then this shouldn't be hard.
> (Also, even if this turns out to be hard for floats, using a float as
> a partitioning key would be a surprising choice because the default
> output representation isn't even unambiguous; you need
> extra_float_digits for that.)

> - Strings. There's basically only one representation for a string.
> If we assume that the hash code only needs to be portable across
> hardware and not across encodings, a position for which I already
> argued upthread, then I think this should be manageable.

Basically, this is simply saying that you're willing to ignore the
hard cases, which reduces the problem to one of documenting the
portability limitations. You might as well not even bother with
worrying about the integer case, because porting between little-
and big-endian systems is surely far less common than cases you've
already said you're okay with blowing off.

That's not an unreasonable position to take, perhaps; doing better
than that is going to be a lot more work and it's not very clear
how much real-world benefit results. But I can't follow the point
of worrying about endianness but not encoding.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2017-05-13 18:14:16 Re: Hash Functions
Previous Message Jeff Davis 2017-05-13 17:51:27 Re: Hash Functions