Re: Hash Functions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: Hash Functions
Date: 2017-05-19 14:09:40
Message-ID: CA+TgmobXtLKhkRKnXnKnxkVuq0Ad2iCDKs5LbT_rAuEaDLG0+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 19, 2017 at 2:36 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> I could agree to something like that. Let's explore some of the challenges
> there and potential solutions:
>
> 1. Dump/reload of hash partitioned data.
>
> Falling back to restore-through-the-root seems like a reasonable answer
> here. Moving to a different encoding is not an edge case, but it's not
> common either, so a performance penalty seems acceptable. I'm not
> immediately sure how we'd implement this in pg_dump/restore, so I'd feel a
> little more comfortable if I saw a sketch.

Right, I think this needs some investigation. I can't whip up a
sketch on short notice, but I'll see if someone else at EnterpriseDB
can work on it unless somebody else wants to take a crack at it.

> 2. Having a lot of hash partitions would be cumbersome
>
> The user would need to create and manage each partition, and try to do
> global operations in a sane way. The normal case would probably involve
> scripts to do things like add an index to all partitions, or a column. Many
> partitions would also just pollute the namespace unless you remember to put
> them in a separate schema (yes, it's easy, but most people will still
> forget). Some syntax sugar would go a long way here.

I agree. Adding a column already cascades to all children, and
there's a proposal to make the same thing true for indexes. See
discussion beginning at
http://postgr.es/m/c8fe4f6b-ff46-aae0-89e3-e936a35f0cfd@postgrespro.ru

I haven't had time to review the code posted there yet, but I would
like to see something along the lines discussed there committed to
v11, and hopefully also something around foreign keys. It should be
possible to create an outbound foreign key on a foreign table and have
that cascade to all children. Conversely, it should also be possible
to create a foreign key referencing a partitioned table provided that
the foreign key references the partitioning key, and that there's a
unique index on those same columns on every partition. (Referencing a
foreign key that is not the partitioning key will have to wait for
global indexes, I think.)

These things are useful not just for hash partitioning, but also for
list and range partitioning, and we'll save a lot of work if we can
use the same infrastructure for both cases.

> 3. The user would need to specify details they really don't care about for
> each partition.
>
> Things like "modulus 16, remainder 0", "modulus 16, remainder 1" are tedious
> boilerplate. And if the user makes a mistake, then 1/16 of inserts start
> failing. Probably would be caught during testing, but not exactly a good
> user experience. I'm not thrilled about this, considering that all the user
> really wants is 16 partitions, but it's not the end of the world.

As I said on the hash partitioning thread, I think the way to handle
this is to get the basic feature in first, then add some convenience
syntax to auto-create the partitions. That shouldn't be very
difficult to implement; I just didn't want to complicate things more
than necessary for the first version. The same issue arose when
discussing list and range partitioning: Oracle has syntax like ours,
but with the ability (requirement?) to create the partitions in the
initial CREATE TABLE statement. However, I wanted to make sure we had
the "inconvenient" syntax fully working and fully tested before we
added that, because I believe pg_dump needs to dump everything out the
long way.

> 4. Detach is a foot-gun
>
> If you detach a partition, random inserts will start failing. Not thrilled
> about this, but a hapless user would accept most of the blame if they
> stumble over it. Another way of saying this is with hash partitioning you
> really need the whole set for the table to be online at all. But we can't
> really enforce that, because it would limit some of the flexibility that you
> have in mind.

Yes, I agree with all of that. I don't think it's really going to be
a problem in practice. The only reason to detach a hash partition is
if you want to split it, and we may eventually have convenience syntax
to do that in an automated (i.e. less error-prone) way. If somebody
does it manually and without a plan for putting back a replacement
partition, they may be sad, but if somebody puts a CHECK (false)
constraint on their table, they may be sad about that, too. It's more
important to allow for flexibility than to prohibit every stupid thing
somebody might try to do. Also, documentation helps. We've got a
chapter on partitioning and it can be expanded to discuss these kinds
of issues.

> Stepping back, your approach might be closer to the general postgres
> philosophy of allowing the user to assemble from spare parts first, then a
> few releases later we offer some pre-built subassemblies, and a few releases
> later we make the typical cases work out of the box. I'm fine with it as
> long as we don't paint ourselves into a corner.

That's basically my thinking here. Right now, our partitioning is
primitive in numerous ways, and so the rough edges are pretty visible.
However, I believe that with careful design we can file down many of
those rough edges over time. Now, it's probably never going to be
quite as smooth as if the system had been designed for partitions from
the ground up, or at least not any time in the foreseeable future, but
I think it can still be very good. If we add the above-referenced
logic for index and foreign key handling, convenience syntax for
creating partitions along with tables and for data-movement
operations, better partition pruning in the planner, run-time
partition-pruning in the executor, partitionwise join and aggregate,
MergeAppend -> Append strength reduction when the required sortorder
matches the partition order, UPDATE tuple routing, default partitions,
and so on, I believe that partitioning will go from "eh, that's better
than inheritance" to "hey, that's actually really good". There is a
ton of work to do there, but I think it is all doable within the
current infrastructure, and all of those things can (and in a number
of cases already are) being worked on as separate patches, so we can
get into each release what is ready and push off to the next release
what isn't. As we go, we'll have fewer and fewer cases where
partitioning a table regresses performance and more and more cases
where the stuff you want to do just works.

Probably the toughest nut to crack is global indexes. An argument was
made on this very mailing list a number of years ago that nobody
should want global indexes because $REASONS, but I was a little
skeptical of that argument at the time and it's been clear to me that
EnterpriseDB's customers, at least, do not in any way accept those
arguments. It works on Oracle or other systems they use, and they
find it useful enough that they're unwilling to be without it. If
PostgreSQL provides the same functionality, they'll use it for more
things than if it doesn't. I find myself more than a bit intimidated
at the prospect of actually trying to make this work, but I've been
convinced that people won't stop asking until it does.

> Of course we still have work to do on the hash functions. We should solve at
> least the most glaring portability problems, and try to harmonize the hash
> opfamilies. If you agree, I can put together a patch or two.

I definitely agree with solving the portability problems to the extent
that we can reasonably do so. I think adding more cross-type hash
opfamilies is a mildly good thing: I don't object to it, it probably
makes sense to do at the same time as any other hashing changes we
want to make, and it's better than not doing it. At the same time, I
wouldn't walk over hot coals for it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-05-19 14:16:22 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Petr Jelinek 2017-05-19 13:33:04 Re: Get stuck when dropping a subscription during synchronizing table