Re: [POC] hash partitioning

From: amul sul <sulamul(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-10-24 07:13:12
Message-ID: CAAJ_b96n4Q_tNLqVG78pm=CG9JiDNxmUiK=BWODzoegPJkW8+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 13, 2017 at 3:00 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-10-12 17:27:52 -0400, Robert Haas wrote:
>> On Thu, Oct 12, 2017 at 4:20 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> >> In other words, it's not utterly fixed in stone --- we invented
>> >> --load-via-partition-root primarily to cope with circumstances that
>> >> could change hash values --- but we sure don't want to be changing it
>> >> with any regularity, or for a less-than-excellent reason.
>> >
>> > Yea, that's what I expected. It'd probably good for somebody to run
>> > smhasher or such on the output of the combine function (or even better,
>> > on both the 32 and 64 bit variants) in that case.
>>
>> Not sure how that test suite works exactly, but presumably the
>> characteristics in practice will depend the behavior of the hash
>> functions used as input the combine function - so the behavior could
>> be good for an (int, int) key but bad for a (text, date) key, or
>> whatever.
>
> I don't think that's true, unless you have really bad hash functions on
> the the component hashes. A hash combine function can't really do
> anything about badly hashed input, what you want is that it doesn't
> *reduce* the quality of the hash by combining.
>

I tried to get suggested SMHasher[1] test result for the hash_combine
for 32-bit and 64-bit version.

SMHasher works on hash keys of the form {0}, {0,1}, {0,1,2}... up to
N=255, using 256-N as the seed, for the hash_combine testing we
needed two hash value to be combined, for that, I've generated 64
and 128-bit hash using cityhash functions[2] for the given smhasher
key then split in two part to test 32-bit and 64-bit hash_combine
function respectively. Attached patch for SMHasher code changes &
output of 32-bit and 64-bit hash_combine testing. Note that I have
skipped speed test this test which is irrelevant here.

By referring other hash function results [3], we can see that hash_combine
test results are not bad either.

Do let me know if current testing is not good enough or if you want me to do
more testing, thanks.

1] https://github.com/aappleby/smhasher
2] https://github.com/aappleby/smhasher/blob/master/src/CityTest.cpp
3] https://github.com/rurban/smhasher/tree/master/doc

Regards,
Amul

Attachment Content-Type Size
smhasher_test_on_hash_combine64.txt text/plain 19.2 KB
smhasher_test_on_hash_combine32.txt text/plain 14.0 KB
0001-add-hash_combine-functions-in-SMHasher.patch application/octet-stream 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ivan Kartyshov 2017-10-24 07:56:15 Re: WIP: long transactions on hot standby feedback replica / proof of concept
Previous Message Simon Riggs 2017-10-24 06:20:10 Remove secondary checkpoint