Re: Hash Functions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: Hash Functions
Date: 2017-05-13 14:29:09
Message-ID: CA+TgmoYe3VpvuyMF3JLHUQvyA38h2arQz-pSVn8DuWY6dwbctg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 13, 2017 at 12:52 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Can we think of defining separate portable hash functions which can be
> used for the purpose of hash partitioning?

I think that would be a good idea. I think it shouldn't even be that
hard. By data type:

- Integers. We'd need to make sure that we get the same results for
the same value on big-endian and little-endian hardware, and that
performance is good on both systems. That seems doable.

- Floats. There may be different representations in use on different
hardware, which could be a problem. Tom didn't answer my question
about whether any even-vaguely-modern hardware is still using non-IEEE
floats, which I suspect means that the answer is "no". If every bit
of hardware we are likely to find uses basically the same
representation of the same float value, then this shouldn't be hard.
(Also, even if this turns out to be hard for floats, using a float as
a partitioning key would be a surprising choice because the default
output representation isn't even unambiguous; you need
extra_float_digits for that.)

- Strings. There's basically only one representation for a string.
If we assume that the hash code only needs to be portable across
hardware and not across encodings, a position for which I already
argued upthread, then I think this should be manageable.

- Everything Else. Basically, everything else is just a composite of
that stuff, I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-05-13 16:07:17 Re: multi-column range partition constraint
Previous Message Amit Kapila 2017-05-13 12:38:30 Re: [PATCH v2] Progress command to monitor progression of long running SQL queries