Re: Hash Functions

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: Hash Functions
Date: 2017-05-13 18:14:16
Message-ID: CAMp0ubfNMSGRvZh7N7TRzHHN5tz0ZeFP13Aq3sv6b0H37fdcPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 12, 2017 at 11:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Forget hash partitioning. There's no law saying that that's a good
> idea and we have to have it. With a different set of constraints,
> maybe we could do it, but I think the existing design decisions have
> basically locked it out --- and I doubt that hash partitioning is so
> valuable that we should jettison other desirable properties to get it.

A lot of the optimizations that can make use of hash partitioning
could also make use of range partitioning. But let me defend hash
partitioning:

* hash partitioning requires fewer decisions by the user
* naturally balances data and workload among partitions in most cases
* easy to match with a degree of parallelism

But with range partitioning, you can have situations where different
tables have different distributions of data. If you partition to
balance the data between partitions in both cases, then that makes
partition-wise join a lot harder because the boundaries don't line up.
If you make the boundaries line up to do partition-wise join, the
partitions might have wildly different amounts of data in them. Either
way, it makes parallelism harder.

Even without considering joins, range partitioning could force you to
make a choice between balancing the data and balancing the workload.
If you are partitioning based on date, then a lot of the workload will
be on more recent partitions. That's desirable sometimes (e.g. for
vacuum) but not always desirable for parallelism.

Hash partitioning doesn't have these issues and goes very nicely with
parallel query.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2017-05-13 19:07:20 Valgrind & tests for `numeric`
Previous Message Tom Lane 2017-05-13 17:57:15 Re: Hash Functions