Re: [POC] hash partitioning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-05-03 02:01:07
Message-ID: CA+TgmoYs_0OK0qB8vO7MY4ig_EWMsG4nmQRpPqHsJH0dL1Kabw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 2, 2017 at 9:01 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, Feb 28, 2017 at 6:33 AM, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
>> In this patch, user can specify a hash function USING. However,
>> we migth need default hash functions which are useful and
>> proper for hash partitioning.
>
> I suggest that we consider the hash functions more carefully. This is
> (effectively) an on-disk format so it can't be changed easily later.
>
> 1. Consider a partition-wise join of two hash-partitioned tables. If
> that's a hash join, and we just use the hash opclass, we immediately
> lose some useful bits of the hash function. Same for hash aggregation
> where the grouping key is the partition key.

Hmm, that could be a problem in some cases. I think there's probably
much less of a problem if the modulus isn't a power of two?

> To fix this, I think we need to include a salt in the hash API. Each
> level of hashing can choose a random salt.

Do you mean that we'd salt partitioning hashing differently from
grouping hashing which would be salted different from aggregation
hashing which, I suppose, would be salted differently from hash index
hashing? Or do you mean that you'd have to specify a salt when
creating a hash-partitioned table, and make sure it's the same across
all compatibly partitioned tables you might want to hash-join? That
latter sounds unappealing.

> 2. Consider a partition-wise join where the join keys are varchar(10)
> and char(10). We can't do that join if we just use the existing hash
> strategy, because 'foo' = 'foo ' should match, but those values
> have different hashes when using the standard hash opclass.
>
> To fix this, we need to be smarter about normalizing values at a
> logical level before hashing. We can take this to varying degrees,
> perhaps even normalizing an integer to a numeric before hashing so
> that you can do a cross-type join on int=numeric.
>
> Furthermore, we need catalog metadata to indicate which hash functions
> are suitable for which cross-type comparisons. Or, to put it the other
> way, which typecasts preserve the partitioning.

You're basically describing what a hash opfamily already does, except
that we don't have a single opfamily that covers both varchar(10) and
char(10), nor do we have one that covers both int and numeric. We
have one that covers int2, int4, and int8, though. If somebody wanted
to make the ones you're suggesting, there's nothing preventing it,
although I'm not sure exactly how we'd encourage people to start using
the new one and deprecating the old one. We don't seem to have a good
infrastructure for that.

> 3. We might want to use a hash function that is a little slower that
> is more resistant to collisions. We may even want to use a 64-bit
> hash.
>
> My opinion is that we should work on this hashing infrastructure
> first, and then support the DDL. If we get the hash functions right,
> that frees us up to create better plans, with better push-downs, which
> will be good for parallel query.

I am opposed to linking the fate of this patch to multiple
independent, possibly large, possibly difficult, possibly
controversial enhancements to the hashing mechanism. If there are
simple things that can reasonably be done in this patch to make hash
partitioning better, great. If you want to work on improving the
hashing mechanism as an independent project, also great. But I think
that most people would rather have hash partitioning in v11 than wait
for v12 or v13 so that other hashing improvements can be completed; I
know I would. If we say "we shouldn't implement hash partitioning
because some day we might make incompatible changes to the hashing
mechanism" then we'll never implement it, because that will always be
true. Even the day after we change it, there still may come a future
day when we change it again.

The stakes have already been raised by making hash indexes durable;
that too is arguably making future changes to the hashing
infrastructure harder. But I still think it was the right thing to
proceed with that work. If we get 64-bit hash codes in the next
release, and we want hash indexes to use them, then we will have to
invalidate existing hash indexes (again). That's sad, but not as sad
as it would have been to not commit the work to make hash indexes
durable. There's a chicken-and-egg problem here: without durable hash
indexes and hash partitioning, there's not much incentive to make
hashing better, but once we have them, changes create a backward
compatibility issue. Such is life; nothing we do is infinitely
future-proof.

The last significant overhaul of the hashing mechanism that I know
about was in 2009, cf. 2604359251d34177a14ef58250d8b4a51d83103b and
8205258fa675115439017b626c4932d5fefe2ea8. Until this email, I haven't
seen any complaints about the quality of that hash function either in
terms of speed or collision properties - what makes you think those
things are serious problems? I *have* heard some interest in widening
the output to 64 bits, and also in finding a way to combine multiple
hash values in some smarter way than we do at present. Seeding has
come up, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-05-03 02:02:37 Re: renaming "transaction log"
Previous Message Noah Misch 2017-05-03 01:44:37 Re: SUBSCRIPTIONS and pg_upgrade