Re: Postgres 11, partitioning with a custom hash function

From: Harry B <harrysungod(at)gmail(dot)com>
To: david(dot)rowley(at)2ndquadrant(dot)com
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres 11, partitioning with a custom hash function
Date: 2018-10-04 20:43:46
Message-ID: CAMG7=yUde-E+4Fd0w=VU7VsgiL0yqpVB6uCi5drs5KLDyOCzFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for the quick response David! this has been really helpful.

Looking at the code, this step wasn't totally unnecessary - if I had
multi-column hash you would have had to do this step anyways - because pg
hashes each column separately and combines them. True, unnecessary for
single column hashes. It would have been better for the custom function to
handle all columns at the same time, but then entire API surface would have
had to change. At least it makes sense to me why it is this way....

All hope is not lost, at least for my case... because.... the bitshifting
you have was on 'a', which was zero. So the expression

a ^= b + UINT64CONST(0x49a0f4dd15e5a8e3) + (a << 54) + (a >> 7);

becomes

a = b + UINT64CONST(0x49a0f4dd15e5a8e3)

This also explains why I noticed a constant-offset from the desired value
regardless of the actual key being hashed.

It also works for the 32 partition example I showed
https://play.golang.org/p/kcD-JhyLYD6
(original session/data in http://dpaste.com/382NDBG )

Now the big question: How scared should I be relying on this? I don't mind
it breaking on major version upgrades (which would mean I need to dump &
restore my entire set), but how likely is it to change unannounced in a
minor/security release? Unless of course, you break it in a way that makes
custom-hash function impossible.

Thanks
--
Harry

On Thu, Oct 4, 2018 at 12:39 PM David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On 5 October 2018 at 06:18, Harry B <harrysungod(at)gmail(dot)com> wrote:
> >
> > Thank you David! These helped me create an operator class.
> > However, there still seems to be a 'off-by-a-fixed-N' difference between
> the
> > hash value returned and how PG selects the partition.
>
> hmm, actually, this is probably due to the hash_combine64() call in
> compute_partition_hash_value(). This seems to combine the hash value
> with 0 regardless of if there's another partition key column to hash.
> If you could somehow do the reverse of what hash_combine64() will do
> to you hash before returning it to the function then you might get
> somewhere, but that does not look possible since it appears to be
> throwing away some bits.
>
> It may not have been a great choice to decide to have
> compute_partition_hash_value() do this unnecessary combine, but it's
> likely a few months too late to change that now.
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

--
Harry

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2018-10-05 00:39:55 Re: Postgres 11, partitioning with a custom hash function
Previous Message David Rowley 2018-10-04 19:39:10 Re: Postgres 11, partitioning with a custom hash function