From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | David Fetter <david(at)fetter(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com> |
Subject: | Re: Hash Functions |
Date: | 2017-05-16 15:29:00 |
Message-ID: | CA+TgmoZ-WCDxkjiHfzKv4Fa18NK-M2d7g1-1pGK2EKMTqGJ1GA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, May 16, 2017 at 11:10 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> With hash partitioning:
> * User only specifies number of partitions of the parent table; does
> not specify individual partition properties (modulus, etc.)
> * Dump/reload goes through the parent table (though we may provide
> options so pg_dump/restore can optimize this)
> * We could provide syntax to adjust the number of partitions, which
> would be expensive but still useful sometimes.
> * All DDL should be on the parent table, including check constraints,
> FKs, unique constraints, exclusion constraints, indexes, etc.
> - Unique and exclusion constraints would only be permitted if the
> keys are a superset of the partition keys.
> - FKs would only be permitted if the two table's partition schemes
> match and the keys are members of the same hash opfamily (this could
> be relaxed slightly, but it gets a little confusing if so)
> * No attach/detach of partitions
> * All partitions have the same permissions
> * Individual partitions would only be individually-addressable for
> maintenance (like reindex and vacuum), but not for arbitrary queries
> - perhaps also COPY for bulk loading/dumping, in case we get clients
> smart enough to do their own hashing.
I don't really find this a very practical design. If the table
partitions are spread across different relfilenodes, then those
relfilenodes have to have separate pg_class entries and separate
indexes, and those indexes also need to have separate pg_class
entries. Otherwise, nothing works. And if they do have separate
pg_class entries, then the partitions have to have their own names,
and likewise for their indexes, and a dump-and-reload has to preserve
those names. If it doesn't, and those objects get new system-assigned
names after the dump-and-reload, then dump restoration can fail when a
system-assigned name collides with an existing name that is first
mentioned later in the dump.
If we had the ability to have anonymous pg_class entries -- relations
that have no names -- then maybe it would be possible to make
something like what you're talking about work. But that does not seem
easy to do. There's a unique index on (relname, relnamespace) for
good reason, and we can't make it partial on a system catalog. We
could make the relname column allow nulls, but that would add overhead
to any code that needs to access the relation name, and there's a fair
amount of that.
Similarly, if we had the ability to associate multiple relfilenodes
with a single relation, and if index entries could point to
<which-relfilenode, block, offset> rather than just <block, offset>,
then we could also make this work. But either of those things would
require significant re-engineering and would have downsides in other
cases.
If Java has portable hash functions, why can't we?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-05-16 15:31:34 | Re: Adding support for Default partition in partitioning |
Previous Message | Robert Haas | 2017-05-16 15:14:27 | Re: Race conditions with WAL sender PID lookups |