Re: Hash Functions

From: amul sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Joe Conway <mail(at)joeconway(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Subject: Re: Hash Functions
Date: 2017-08-31 12:40:28
Message-ID: CAAJ_b96jYefFEXUgVqhLbNh=fM9NfXcx6RLktr9nZC-WNvsucg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 30, 2017 at 9:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Aug 30, 2017 at 10:43 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
> > Thanks for the suggestion, I have updated 0002-patch accordingly.
> > Using this I found some strange behaviours as follow:
> >
> > 1) standard and extended0 output for the jsonb_hash case is not same.
> > 2) standard and extended0 output for the hash_range case is not same when
> > input
> > is int4range(550274, 1550274) other case in the patch are fine. This
> can
> > be
> > reproducible with other values as well, for e.g. int8range(1570275,
> > 208112489).
> >
> > Will look into this tomorrow.
>
> Those sound like bugs in your patch. Specifically:
>
> + /* Same approach as hash_range */
> + result = hash_uint32_extended((uint32) flags, seed);
> + result ^= lower_hash;
> + result = (result << 1) | (result >> 63);
> + result ^= upper_hash;
> ​
>

Yes, you are correct.​

> ​
>
> That doesn't give compatible results. The easiest thing to do might
> be to rotate the high 32 bits and the low 32 bits separately.
> JsonbHashScalarValueExtended has the same problem. Maybe create a
> helper function that does something like this (untested):
>
> ((x << 1) & UINT64COUNT(0xfffffffefffffffe)) | ((x >> 31) &
> UINT64CONST(0x100000001))
> ​
>

This working as expected, also tested by executing the following SQL
multiple times:

SELECT v as value, hash_range(v)::bit(32) as standard,
hash_range_extended(v, 0)::bit(32) as extended0,
hash_range_extended(v, 1)::bit(32) as extended1
FROM (VALUES (int8range(floor(random() * 100)::int8, floor(random() *
1000)::int8)),
(int8range(floor(random() * 1000)::int8, floor(random() *
10000)::int8)),
(int8range(floor(random() * 10000)::int8, floor(random() *
100000)::int8)),
(int8range(floor(random() * 10000000)::int8, floor(random() *
100000000)::int8)),
(int8range(floor(random() * 100000000)::int8, floor(random() *
1000000000)::int8))) x(v)
WHERE hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32); ​

> > Another case which I want to discuss is, extended and standard version of
> > hashfloat4, hashfloat8 & hash_numeric function will have the same output
> for
> > zero
> > value irrespective of seed value. Do you think we need a fix for this?
>
> Yes, I think you should return the seed rather than 0 in the cases
> where the current code hard-codes a 0 return. That will give the same
> behavior in the seed == 0 case while cheaply getting us something a
> bit different when there is a seed.
>
> ​Fixed in the attached version.​

> BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this
> patch.
>
> Added i
n the attached version
​.​

Thanks for your all the suggestions.

Regards,
Amul

Attachment Content-Type Size
0001-add-optional-second-hash-proc-v4.patch application/octet-stream 63.0 KB
0002-test-Hash_functions_v3_wip.patch application/octet-stream 27.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-08-31 12:40:45 Re: pgbench tap tests & minor fixes.
Previous Message Tom Lane 2017-08-31 12:34:11 Re: log_destination=file