Re: Parallel Hash take II

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Parallel Hash take II
Date: 2017-09-01 22:32:16
Message-ID: CAEepm=2q6BnXdiySNWmf+5y3K_ZF+Kq-vgULka4HW6cjrJgj8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 2, 2017 at 5:13 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Aug 31, 2017 at 8:53 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Check out ExecReScanGather(): it shuts down and waits for all workers
>> to complete, which makes the assumptions in ExecReScanHashJoin() true.
>> If a node below Gather but above Hash Join could initiate a rescan
>> then the assumptions would not hold. I am not sure what it would mean
>> though and we don't generate any such plans today to my knowledge. It
>> doesn't seem to make sense for the inner side of Nested Loop to be
>> partial. Have I missed something here?
>
> I bet this could happen, although recent commits have demonstrated
> that my knowledge of how PostgreSQL handles rescans is less than
> compendious. Suppose there's a Nested Loop below the Gather and above
> the Hash Join, implementing a join condition that can't give rise to a
> parameterized path, like a.x + b.x = 0.

Hmm. I still don't see how that could produce a rescan of a partial
path without an intervening Gather, and I would really like to get to
the bottom of this.

At the risk of mansplaining the code that you wrote and turning out to
be wrong: A Nested Loop can't ever have a partial path on the inner
side. Under certain circumstances it can have a partial path on the
outer side, because its own results are partial, but for each outer
row it needs to do a total (non-partial) scan of the inner side so
that it can reliably find or not find matches. Therefore we'll never
rescan partial paths directly, we'll only ever rescan partial paths
indirectly via a Gatheroid node that will synchronise the rescan of
all children to produce a non-partial result.

There may be more reasons to rescan that I'm not thinking of. But the
whole idea of a rescan seems to make sense only for non-partial paths.
What would it even mean for a worker process to decide to rescan (say)
a Seq Scan without any kind of consensus?

Thought experiment: I suppose we could consider replacing Gather's
clunky shut-down-and-relaunch-workers synchronisation technique with a
new protocol where the Gather node sends a 'rescan!' message to each
worker and then discards their tuples until it receives 'OK, rescan
starts here', and then each parallel-aware node type supplies its own
rescan synchronisation logic as appropriate. For example, Seq Scan
would somehow need to elect one participant to run
heap_parallelscan_reinitialize and others would wait until it has
done. This might not be worth the effort, but thinking about this
problem helped me see that rescan of a partial plan without a Gather
node to coordinate doesn't make any sense.

Am I wrong?

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-01 22:45:31 Re: Parallel Hash take II
Previous Message Jeff Janes 2017-09-01 21:42:12 Re: pg_basebackup throttling doesn't throttle as promised