Re: asynchronous execution

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: asynchronous execution
Date: 2016-10-03 10:46:32
Message-ID: 20161003.194632.204401048.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the thought.

At Fri, 23 Sep 2016 21:09:03 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoaXQEt4tZ03FtQhnzeDEMzBck+Lrni0UWHVVgOTnA6C1w(at)mail(dot)gmail(dot)com>
> [ Adjusting subject line to reflect the actual topic of discussion better. ]
>
> On Fri, Sep 23, 2016 at 9:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Fri, Sep 23, 2016 at 8:45 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> >> For e.g., in the above plan which you specified, suppose :
> >> 1. Hash Join has called ExecProcNode() for the child foreign scan b, and so
> >> is
> >> waiting in ExecAsyncWaitForNode(foreign_scan_on_b).
> >> 2. The event wait list already has foreign scan on a that is on a different
> >> subtree.
> >> 3. This foreign scan a happens to be ready, so in
> >> ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called,
> >> which returns with result_ready.
> >> 4. Since it returns result_ready, it's parent node is now inserted in the
> >> callbacks array, and so it's parent (Append) is executed.
> >> 5. But, this Append planstate is already in the middle of executing Hash
> >> join, and is waiting for HashJoin.
> >
> > Ah, yeah, something like that could happen. I've spent much of this
> > week working on a new design for this feature which I think will avoid
> > this problem. It doesn't work yet - in fact I can't even really test
> > it yet. But I'll post what I've got by the end of the day today so
> > that anyone who is interested can look at it and critique.
>
> Well, I promised to post this, so here it is. It's not really working
> all that well at this point, and it's definitely not doing anything
> that interesting, but you can see the outline of what I have in mind.
> Since Kyotaro Horiguchi found that my previous design had a
> system-wide performance impact due to the ExecProcNode changes, I
> decided to take a different approach here: I created an async
> infrastructure where both the requestor and the requestee have to be
> specifically modified to support parallelism, and then modified Append
> and ForeignScan to cooperate using the new interface. Hopefully that
> means that anything other than those two nodes will suffer no
> performance impact. Of course, it might have other problems....

The previous framework didn't need to distinguish async-capable
and uncapable nodes from the parent node's view. The things in
ExecProcNode was required for the reason. Instead, this new one
removes the ExecProcNode stuff by distinguishing the two kinds of
node in async-aware parents, that is, Append. This no longer
involves async-unaware nodes into the tuple bubbling-up mechanism
so the reentrant problem doesn't seem to occur.

On the other hand, for example, the following plan, regrardless
of its practicality, (there should be more good example..)

(Async-unaware node)
- NestLoop
- Append
- n * ForegnScan
- Append
- n * ForegnScan

If the NestLoop, Append are async-aware, all of the ForeignScans
can run asynchronously with the previous framework. The topmost
NestLoop will be awakened after that firing of any ForenScans
makes a tuple bubbles up to the NestLoop. This is because the
not-need-to-distinguish-aware-or-not nature provided by the
ExecProcNode stuff.

On the other hand, with the new one, in order to do the same
thing, ExecAppend have in turn to behave differently whether the
parent is async or not. To do this will be bothersome but not
with confidence.

I examine this further intensively, especially for performance
degeneration and obstacles to complete this.

> Some notes:
>
> - EvalPlanQual rechecks are broken.
> - EXPLAIN ANALYZE instrumentation is broken.
> - ExecReScanAppend is broken, because the async stuff needs some way
> of canceling an async request and I didn't invent anything like that
> yet.
> - The postgres_fdw changes pretend to be async but aren't actually.
> It's just a demo of (part of) the interface at this point.
> - The postgres_fdw changes also report all pg-fdw paths as
> async-capable, but actually the direct-modify ones aren't, so the
> regression tests fail.
> - Errors in the executor can leak the WaitEventSet. Probably we need
> to modify ResourceOwners to be able to own WaitEventSets.
> - There are probably other bugs, too.
>
> Whee!
>
> Note that I've tried to solve the re-entrancy problems by (1) putting
> all of the event loop's state inside the EState rather than in local
> variables and (2) having the function that is called to report arrival
> of a result be thoroughly different than the function that is used to
> return a tuple to a synchronous caller.
>
> Comments welcome, if you're feeling brave enough to look at anything
> this half-baked.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Benedikt Grundmann 2016-10-03 10:48:15 Decoding proacl
Previous Message Christoph Berg 2016-10-03 10:43:44 Re: pgbench - allow backslash continuations in \set expressions