Re: Parallel Hash take II

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Parallel Hash take II
Date: 2017-09-01 17:13:22
Message-ID: CA+TgmoY4LogYcg1y5JPtto_fL-DBUqvxRiZRndDC70iFiVsVFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 31, 2017 at 8:53 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Solution 2: Teach tuple queues to spill to disk instead of blocking
> when full. I think this behaviour should probably only be activated
> while the leader is running the plan rather than draining tuple
> queues; the current block-when-full behaviour would still be
> appropriate if the leader is simply unable to drain queues fast
> enough. Then the deadlock risk would go away.
>
> When I wrote it, I figured that leader_gate.c was cheap and would do
> for now, but I have to admit that it's quite confusing and it sucks
> that later batches lose a core. I'm now thinking that 2 may be a
> better idea. My first thought is that Gather needs a way to advertise
> that it's busy while running the plan, shm_mq needs a slightly
> different all-or-nothing nowait mode, and TupleQueue needs to write to
> a shared tuplestore or other temp file-backed mechanism when
> appropriate. Thoughts?

The problem with solution 2 is that it might lead to either (a)
unbounded amounts of stuff getting spooled to files or (b) small spool
files being repeatedly created and deleted depending on how the leader
is spending its time. If you could spill only when the leader is
actually waiting for the worker, that might be OK.

> Check out ExecReScanGather(): it shuts down and waits for all workers
> to complete, which makes the assumptions in ExecReScanHashJoin() true.
> If a node below Gather but above Hash Join could initiate a rescan
> then the assumptions would not hold. I am not sure what it would mean
> though and we don't generate any such plans today to my knowledge. It
> doesn't seem to make sense for the inner side of Nested Loop to be
> partial. Have I missed something here?

I bet this could happen, although recent commits have demonstrated
that my knowledge of how PostgreSQL handles rescans is less than
compendious. Suppose there's a Nested Loop below the Gather and above
the Hash Join, implementing a join condition that can't give rise to a
parameterized path, like a.x + b.x = 0.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-09-01 17:15:24 Re: OpenFile() Permissions Refactor
Previous Message Tom Lane 2017-09-01 17:10:23 Re: GnuTLS support