Re: Hash Functions

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: Hash Functions
Date: 2017-05-13 23:01:17
Message-ID: CAMp0ube+TYwN4Uu1kF+cQWjcAPvozCr_=ChezFfZmL26DZKenQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 12, 2017 at 12:38 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> That is a good question. I think it basically amounts to this
> question: is hash partitioning useful, and if so, for what?

Two words: parallel query. To get parallelism, one of the best
approaches is dividing the data, then doing as much work as possible
before combining it again. If you have hash partitions on some key,
then you can do partition-wise joins or partition-wise aggregation on
that key in parallel with no synchronization/communication overhead
(until the final result).

You've taken postgres pretty far in this direction already; hash
partitioning will take it one step further by allowing more pushdowns
and lower sync/communication costs.

Some of these things can be done with range partitioning, too, but see
my other message here:

https://www.postgresql.org/message-id/CAMp0ubfNMSGRvZh7N7TRzHHN5tz0ZeFP13Aq3sv6b0H37fdcPg%40mail.gmail.com

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-05-13 23:08:29 Re: Hash Functions
Previous Message Andres Freund 2017-05-13 22:49:21 Re: walsender & parallelism