Re: Hash Functions

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: Hash Functions
Date: 2017-05-16 15:10:39
Message-ID: CAMp0ubcQ3VYdU1kNUCOmpj225U4hk6ZEoaUVeReP8h60p+mv1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 15, 2017 at 1:04 PM, David Fetter <david(at)fetter(dot)org> wrote:
> As the discussion has devolved here, it appears that there are, at
> least conceptually, two fundamentally different classes of partition:
> public, which is to say meaningful to DB clients, and "private", used
> for optimizations, but otherwise opaque to DB clients.
>
> Mashing those two cases together appears to cause more problems than
> it solves.

I concur at this point. I originally thought hash functions might be
made portable, but I think Tom and Andres showed that to be too
problematic -- the issue with different encodings is the real killer.

But I also believe hash partitioning is important and we shouldn't
give up on it yet.

That means we need to have a concept of hash partitions that's
different from range/list partitioning. The terminology
"public"/"private" does not seem appropriate. Logical/physical or
external/internal might be better.

With hash partitioning:
* User only specifies number of partitions of the parent table; does
not specify individual partition properties (modulus, etc.)
* Dump/reload goes through the parent table (though we may provide
options so pg_dump/restore can optimize this)
* We could provide syntax to adjust the number of partitions, which
would be expensive but still useful sometimes.
* All DDL should be on the parent table, including check constraints,
FKs, unique constraints, exclusion constraints, indexes, etc.
- Unique and exclusion constraints would only be permitted if the
keys are a superset of the partition keys.
- FKs would only be permitted if the two table's partition schemes
match and the keys are members of the same hash opfamily (this could
be relaxed slightly, but it gets a little confusing if so)
* No attach/detach of partitions
* All partitions have the same permissions
* Individual partitions would only be individually-addressable for
maintenance (like reindex and vacuum), but not for arbitrary queries
- perhaps also COPY for bulk loading/dumping, in case we get clients
smart enough to do their own hashing.

The only real downside is that it could surprise users -- why can I
add a CHECK constraint on my range-partitioned table but not the
hash-partitioned one? We should try to document this so users don't
find that out too far along. As long as they aren't surprised, I think
users will understand why these aren't quite the same concepts.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-05-16 15:14:27 Re: Race conditions with WAL sender PID lookups
Previous Message Ildus Kurbangaliev 2017-05-16 14:26:00 Re: Bug in ExecModifyTable function and trigger issues for foreign tables