Re: asynchronous and vectorized execution

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: asynchronous and vectorized execution
Date: 2016-09-23 12:45:40
Message-ID: CAJ3gD9fZ=rtBZ0i1_pxycbkgxi=OzTgv1n0ojkmK318Mcc921A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 September 2016 at 20:20, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Aug 29, 2016 at 4:08 AM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > [ new patches ]
>
> + /*
> + * We assume that few nodes are async-aware and async-unaware
> + * nodes cannot be revserse-dispatched from lower nodes that
> is
> + * async-aware. Firing of an async node that is not a
> descendant
> + * of the planstate will cause such reverse-diaptching to
> + * async-aware nodes, which is unexpected behavior for them.
> + *
> + * For instance, consider an async-unaware Hashjoin(OUTER,
> INNER)
> + * where the OUTER is running asynchronously but the Hashjoin
> is
> + * waiting on the async INNER during inner-hash creation. If
> the
> + * OUTER fires for the case, since anyone is waiting on it,
> + * ExecAsyncWaitForNode finally dispatches to the Hashjoin
> which
> + * is now in the middle of thing its work.
> + */
> + if (!IsParent(planstate, node))
> + continue;
>
> I'm not entirely sure that I understand this comment, but I don't
> think it's going in the right direction. Let's start with the example
> in the second paragraph. If the hash join is async-unaware, then it
> isn't possible for the hash join to be both running the outer side of
> the join asynchronously and at the same time waiting on the inner
> side. Once it tries to pull the first tuple from the outer side, it's
> waiting for that to finish and can't do anything else. So, the inner
> side can't possibly get touched in any way until the outer side
> finishes. For anything else to happen, the hash join would have to be
> async-aware. Even if we did that, I don't think it would be right to
> kick off both sides of the hash join at the same time. Right now, if
> the outer side turns out to be empty, we never need to build the hash
> table, and that's good.
>

I feel the !IsParent() condition is actually to prevent the infinite wait
caused by a re-entrant issue in ExecAsuncWaitForNode() that Kyotaro
mentioned
earlier. But yes, the comments don't explain exactly how the hash join can
cause the re-entrant issue.

But I attempted to come up with some testcase which might reproduce the
infinite-waiting in ExecAsyncWaitForNode() after removing the !IsParent()
check
so that the other subtree nodes are also included, but I couldn't reproduce.
Kyotaro, is it possible for you to give a testcase that consistently hangs
if
we revert back the !IsParent() check ?

I was also thinking about another possibility where the same plan state
node is
re-entered, as explained below.

>
> I don't think it's a good idea to wait for only nodes that are in the
> current subtree. For example, consider a plan like this:
>
> Append
> -> Foreign Scan on a
> -> Hash Join
> -> Foreign Scan on b
> -> Hash
> -> Seq Scan on x
>
> Suppose Append and Foreign Scan are parallel-aware but the other nodes
> are not. Append kicks off the Foreign Scan on a and then waits for
> the hash join to produce a tuple; the hash join kicks off the Foreign
> Scan on b and waits for it to return a tuple. If, while we're waiting
> for the foreign scan on b, the foreign scan on a needs some attention
> - either to produce tuples, or maybe just to call PQconsumeInput() so
> that more data can be sent from the other side, I think we need to be
> able to do that. There's no real problem here; even if the Append
> becomes result-ready before the hash join returns, that is fine.

Yes I agree : we should be able to do this. Sine we have all the waiting
events
in a common estate, there's no harm if we start executing nodes of another
sub-tree if we get an event from there.

But I am thinking about what would happen when this node from other sub-tree
returns result_ready, and then it's parents are called, and then the result
gets bubbled up upto the node which had already caused us to call
ExecAsyncWaitForNode() in the first place.

For e.g., in the above plan which you specified, suppose :
1. Hash Join has called ExecProcNode() for the child foreign scan b, and so
is
waiting in ExecAsyncWaitForNode(foreign_scan_on_b).
2. The event wait list already has foreign scan on a that is on a different
subtree.
3. This foreign scan a happens to be ready, so in
ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called,
which returns with result_ready.
4. Since it returns result_ready, it's parent node is now inserted in the
callbacks array, and so it's parent (Append) is executed.
5. But, this Append planstate is already in the middle of executing Hash
join, and is waiting for HashJoin.

Is this safe to execute the same plan state when it is already inside its
execution ? In other words, is the plan state re-entrant ? I suspect, the
new
execution may even corrupt the structures with which it was already
executing.

In usual cases, a tree can contain multiple plan state nodes belonging to
the
same plan node, but in this case, we are having the same plan state node
being
executed again while it is already executing.

I suspect this can be one reason why Kyotaro might be getting infinite
recursion issues. May be we need to prevent a plan state node to re-enter,
but allow nodes from any subtree to execute. So propagate the result upwards
until we get a node which is already executing.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-09-23 12:46:45 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message Peter Eisentraut 2016-09-23 12:41:38 Re: pageinspect: Hash index support