Re: Hash Functions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: amul sul <sulamul(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Joe Conway <mail(at)joeconway(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Subject: Re: Hash Functions
Date: 2017-08-29 18:18:15
Message-ID: CA+TgmoYPvoTMGJYkVBA=2j1o1wpZ-WZCYiS7_B-AqqZBkWT4HQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 22, 2017 at 8:14 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
> Attaching patch 0002 for the reviewer's testing.

I think that this 0002 is not something we can think of committing
because there's no guarantee that hash functions will return the same
results on all platforms. However, what we could and, I think, should
do is hash some representative values of each data type and verify
that hash(x) and hashextended(x, 0) come out the same at least as to
the low-order 32 bits -- and maybe that hashextend(x, 1) comes out
different as to the low-order 32 bits. The easiest way to do this
seems to be to cast to bit(32). For example:

SELECT v, hashint4(v)::bit(32) as standard,
hashint4extended(v, 0)::bit(32) as extended0,
hashint4extended(v, 1)::bit(32) as extended1
FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);

I suggest adding a version of this for each data type.

From your latest version of 0001, I get:

rangetypes.c:1297:8: error: unused variable 'rngtypid'
[-Werror,-Wunused-variable]
Oid rngtypid = RangeTypeGetOid(r);

I suggest not duplicating the comments from each regular function into
the extended function, but just writing /* Same approach as hashfloat8
*/ when the implementation is non-trivial (no need for this if the
extended function is a single line or the original function had no
comments anyway).

hash_aclitem seems embarrassingly stupid. I suggest that we make the
extended version slightly less stupid -- i.e. if the seed is non-zero,
actually call hash_any_extended on the sum and pass the seed through.

* Reset info about hash function whenever we pick up new info about
* equality operator. This is so we can ensure that the hash function
* matches the operator.
*/
typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
+ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);

Adjust comment: "about hash function" -> "about hash functions", "hash
functions matches" -> "hash functions match".

+extern Datum
+hash_any_extended(register const unsigned char *k, register int
+ keylen, uint64 seed);

Formatting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-29 19:24:46 Re: Improving overflow checks when adding tuple to PGresult Re: [GENERAL] Retrieving query results
Previous Message Bossart, Nathan 2017-08-29 16:47:30 Re: [Proposal] Allow users to specify multiple tables in VACUUM commands