Re: Asynchronous execution on FDW

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: hlinnaka(at)iki(dot)fi
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Asynchronous execution on FDW
Date: 2015-07-07 01:19:35
Message-ID: 20150707.101935.28049720.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, thank you for looking this.

If it is acceptable to reconstruct the executor nodes to have
additional return state PREP_RUN or such (which means it needs
one more call for the first tuple) , I'll modify the whole
executor to handle the state in the next patch to do so.

I haven't take the advice I had so far in this sense. But I came
to think that it is the most reasonable way to solve this.

======
> > - It was a problem when to give the first kick for async exec. It
> > is not in ExecInit phase, and ExecProc phase does not fit,
> > too. An extra phase ExecPreProc or something is too
> > invasive. So I tried "pre-exec callback".
> >
> > Any init-node can register callbacks on their turn, then the
> > registerd callbacks are called just before ExecProc phase in
> > executor. The first patch adds functions and structs to enable
> > this.
>
> At a quick glance, I think this has all the same problems as starting
> the execution at ExecInit phase. The correct way to do this is to kick
> off the queries in the first IterateForeignScan() call. You said that
> "ExecProc phase does not fit" - why not?

Execution nodes are expected to return the first tuple if
available. But asynchronous execution can not return the first
tuple immediately. Simultaneous execution for the first tuple on
every foreign node is crucial than asynchronous fetching for many
cases, especially for the cases like sort/agg pushdown on FDW.

The reason why ExecProc does not fit is that the first loop
without returning tuple looks impact too large portion in
executor.

It is my mistake that it doesn't address the problem about
parameterized paths. Parameterized paths should be executed
within ExecProc loops so this patch would be like following.

- To gain the advantage of kicking execution before the first
ExecProc loop, non-parameterized paths are started using the
callback feature this patch provides.

- Parameterized paths need the upper nodes executed before it
starts execution so they should be start in ExecProc loop, but
runs asynchronously if possible.

This is rather a makeshift solution for the problem, but
considering current trend of parallelism, it might the time to
make the executor to fit parallel execution.

If it is acceptable to reconstruct the executor nodes to have
additional return state PREP_RUN or such (which means it needs
one more call for the first tuple) , I'll modify the whole
executor to handle the state in the next patch to do so.

I hate my stupidity if you suggested this kind of solution by "do
it in ExecProc":(

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-07-07 01:40:17 Re: Support for N synchronous standby servers - take 2
Previous Message Haribabu Kommi 2015-07-07 00:49:47 Re: Parallel Seq Scan