From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru> |
Subject: | Re: Parallel Hash take II |
Date: | 2017-09-02 01:30:59 |
Message-ID: | CA+TgmobQpSBj0FRepXrwfdMgmbcWXffrJvA_-j_vsKHfABsR1w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 1, 2017 at 7:42 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> I'm thinking about something like this:
>>
>> Gather
>> -> Nested Loop
>> -> Parallel Seq Scan
>> -> Hash Join
>> -> Seq Scan
>> -> Parallel Hash
>> -> Parallel Seq Scan
>>
>> The hash join has to be rescanned for every iteration of the nested loop.
>
> I think you mean:
>
> Gather
> -> Nested Loop
> -> Parallel Seq Scan
> -> Parallel Hash Join
> -> Parallel Seq Scan
> -> Parallel Hash
> -> Parallel Seq Scan
I don't, though, because that's nonsense. Maybe what I wrote is also
nonsense, but it is at least different nonsense.
Let's try it again with some table names:
Gather
-> Nested Loop
-> Parallel Seq Scan on a
-> (Parallel?) Hash Join
-> Seq Scan on b (NOT A PARALLEL SEQ SCAN)
-> Parallel Hash
-> Parallel Seq Scan on c
I argue that this is a potentially valid plan. b, of course, has to
be scanned in its entirety by every worker every time through, which
is why it's not a Parallel Seq Scan, but that requirement does not
apply to c. If we take all the rows in c and stick them into a
DSM-based hash table, we can reuse them every time the hash join is
rescanned and, AFAICS, that should work just fine, and it's probably a
win over letting each worker build a separate copy of the hash table
on c, too.
Of course, there's the "small" problem that I have no idea what to do
if the b-c join is (or becomes) multi-batch. When I was thinking
about this before, I was imagining that this case might Just Work with
your patch provided that you could generate a plan shaped like this,
but now I see that that's not actually true, because of multiple
batches.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-09-02 01:33:17 | Re: Adding support for Default partition in partitioning |
Previous Message | Thomas Munro | 2017-09-02 00:21:30 | Re: Optional message to user when terminating/cancelling backend |