Re: General purpose hashing func in pgbench

From: Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: General purpose hashing func in pgbench
Date: 2018-03-06 13:47:43
Message-ID: 75f45845-24ed-347b-66d5-2f18d39c6793@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Teodor,

Thank you for reviewing this patch.

On 06.03.2018 15:53, Teodor Sigaev wrote:
>>> Patch applies, compiles, pgbench & global "make check" ok, doc
>>> built ok.
>
> Agree.
>
> If I understand upthread correctly, implementation of Murmur hash
> algorithm based on Austin Appleby work
> https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp
>
> If so, I have notice and objections:
>
> 1) Seems, it's good idea to add credits to Austin Appleby to
> comments.
>

Sounds fair, I'll send an updated version soon.

> 2) Reference implementaion directly says (link above): // 2. It will
> not produce the same results on little-endian and big-endian //
> machines.
>
> I don't think that is good thing for testing and benchmarking for
> several reasons: it could produce different data collection,
> different selects, different distribution.
>
> 3) Again, from comments of reference implementation: // Note - This
> code makes a few assumptions about how your machine behaves - // 1.
> We can read a 4-byte value from any address without crashing
>
> It's not true for all supported platforms. Any box with strict
> aligment will SIGBUSed here.
>

I think that both points refer to the fact that original algorithm
accepts a byte string as an input, slices it up by 8 bytes and form
unsigned int values from it. In that case the order of bytes in the
input string does matter since it may result in different integers on
different architectures. And it is also fair requirement for a byte
string to be aligned as some architectures cannot handle unaligned data.
In this patch though I slightly modified the original algorithm in a way
that it takes unsigned ints as an input (not byte string), so neither of
this points should be a problem as it seems to me. But I'll double check
it on big-endian machine with strict alignment (Sparc).

Thanks!

--
Ildar Musin
i(dot)musin(at)postgrespro(dot)ru

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arthur Zakirov 2018-03-06 14:00:25 Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw
Previous Message Darafei Komяpa Praliaskouski 2018-03-06 13:39:07 Re: All Taxi Services need Index Clustered Heap Append