Re: parallel joins, and better parallel explain

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel joins, and better parallel explain
Date: 2015-12-04 16:05:55
Message-ID: CANP8+jJfV8HP7ZNvfbKNd5tDd5fRLw_BwnLV-4EuDmk+VAn9Dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30 November 2015 at 17:52, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> My idea is that you'd end up with a plan like this:
>
> Gather
> -> Hash Join
> -> Parallel Seq Scan
> -> Parallel Hash
> -> Parallel Seq Scan
>
> Not only does this build only one copy of the hash table instead of N
> copies, but we can parallelize the hash table construction itself by
> having all workers insert in parallel, which is pretty cool.

Hmm. If the hash table is small it should be OK to keep it locally. If its
larger, we need the shared copy. Seems like we can do the math on when to
use each kind of hash table.... build it in shmem and then copy locally if
needed.

Another way might to force the hash table into N batches, then give each
scan the task of handling one batch. That would allow a much larger hash
table to still be kept locally, moving the problem towards routing the data.

I'm not immediately convinced by the coolness of loading the hash table in
parallel. A whole class of bugs could be avoided if we choose not to, plus
the hash table is frequently so small a part of the HJ that its not going
to gain us very much.

The other way to look at what you've said here is that you don't seem to
have a way of building the hash table in only one process, which concerns
me.

What I can confirm at this point is that
> I've thought about the problem you're asking about here, and that
> EnterpriseDB intends to contribute code to address it.

Good

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Catalin Iacob 2015-12-04 16:08:21 Re: proposal: multiple psql option -c
Previous Message Alvaro Herrera 2015-12-04 15:56:16 Re: [DOCS] max_worker_processes on the standby