Re: Parallel Hash take II

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Parallel Hash take II
Date: 2017-09-01 22:45:31
Message-ID: CA+TgmoZa5AMpu62-SFCcrtrYAqye32+SRpTvyDPvFCpRsw9yHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 1, 2017 at 6:32 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Sat, Sep 2, 2017 at 5:13 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Aug 31, 2017 at 8:53 AM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> Check out ExecReScanGather(): it shuts down and waits for all workers
>>> to complete, which makes the assumptions in ExecReScanHashJoin() true.
>>> If a node below Gather but above Hash Join could initiate a rescan
>>> then the assumptions would not hold. I am not sure what it would mean
>>> though and we don't generate any such plans today to my knowledge. It
>>> doesn't seem to make sense for the inner side of Nested Loop to be
>>> partial. Have I missed something here?
>>
>> I bet this could happen, although recent commits have demonstrated
>> that my knowledge of how PostgreSQL handles rescans is less than
>> compendious. Suppose there's a Nested Loop below the Gather and above
>> the Hash Join, implementing a join condition that can't give rise to a
>> parameterized path, like a.x + b.x = 0.
>
> Hmm. I still don't see how that could produce a rescan of a partial
> path without an intervening Gather, and I would really like to get to
> the bottom of this.

I'm thinking about something like this:

Gather
-> Nested Loop
-> Parallel Seq Scan
-> Hash Join
-> Seq Scan
-> Parallel Hash
-> Parallel Seq Scan

The hash join has to be rescanned for every iteration of the nested loop.

Maybe I'm confused.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-09-01 23:42:49 Re: Parallel Hash take II
Previous Message Thomas Munro 2017-09-01 22:32:16 Re: Parallel Hash take II