Re: Hash Functions

From: amul sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Joe Conway <mail(at)joeconway(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Subject: Re: Hash Functions
Date: 2017-08-30 14:43:34
Message-ID: CAAJ_b942x+gE3u2r4Mk41X2wCtWLaGdLotRqDo+_X6ELCktp4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 29, 2017 at 11:48 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Aug 22, 2017 at 8:14 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
> > Attaching patch 0002 for the reviewer's testing.
>
> I think that this 0002 is not something we can think of committing
> because there's no guarantee that hash functions will return the same
> results on all platforms. However, what we could and, I think, should
> do is hash some representative values of each data type and verify
> that hash(x) and hashextended(x, 0) come out the same at least as to
> the low-order 32 bits -- and maybe that hashextend(x, 1) comes out
> different as to the low-order 32 bits. The easiest way to do this
> seems to be to cast to bit(32). For example:
>
> SELECT v, hashint4(v)::bit(32) as standard,
> hashint4extended(v, 0)::bit(32) as extended0,
> hashint4extended(v, 1)::bit(32) as extended1
> FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
> WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
> OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);
>
> I suggest adding a version of this for each data type.
>

​Thanks for the suggestion, I have updated 0002-patch accordingly.
Using this I found some strange behaviours as follow:

1) standard and extended0 output for the jsonb_hash case is not same.
2) standard and extended0 output for the hash_range case is not same when
input
is int4range(550274, 1550274) other case in the patch are fine. This
can be
reproducible with other values as well, for e.g. int8range(1570275,
208112489).

Will look into this tomorrow.

​Another case which I want to discuss is, extended and standard version of
hashfloat4, hashfloat8 & hash_numeric function will have the same output
for zero
value irrespective of seed value. Do you think we need a fix for this?

> From your latest version of 0001, I get:
>
> rangetypes.c:1297:8: error: unused variable 'rngtypid'
> [-Werror,-Wunused-variable]
> Oid rngtypid = RangeTypeGetOid(r);
>
> ​Fixed in the attached version.​

> I suggest not duplicating the comments from each regular function into
> the extended function, but just writing /* Same approach as hashfloat8
> */ when the implementation is non-trivial (no need for this if the
> extended function is a single line or the original function had no
> comments anyway).
>
> ​
Fixed in the attached version.​

> hash_aclitem seems embarrassingly stupid. I suggest that we make the
> extended version slightly less stupid -- i.e. if the seed is non-zero,
> actually call hash_any_extended on the sum and pass the seed through.
>
> * Reset info about hash function whenever we pick up new info
> about
> * equality operator. This is so we can ensure that the hash
> function
> * matches the operator.
> */
> typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
> + typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
>
> Adjust comment: "about hash function" -> "about hash functions", "hash
> functions matches" -> "hash functions match".
>
> ​
Fixed in the attached version.​

> +extern Datum
> +hash_any_extended(register const unsigned char *k, register int
> + keylen, uint64 seed);
>
> Formatting.
> ​
>

​Fixed in the attached version.​

​Thanks !

Regards,
Amul​

Attachment Content-Type Size
0001-add-optional-second-hash-proc-v3.patch application/octet-stream 61.7 KB
0002-test-Hash_functions_v2_wip.patch application/octet-stream 28.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-30 15:19:38 Re: Parallel worker error
Previous Message Robert Haas 2017-08-30 14:31:42 Re: expanding inheritance in partition bound order