Re: Asynchronous execution on FDW

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: kaigai(at)ak(dot)jp(dot)nec(dot)com
Cc: robertmhaas(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Asynchronous execution on FDW
Date: 2015-07-24 06:10:59
Message-ID: 20150724.151059.102807210.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Thu, 23 Jul 2015 09:38:39 +0000, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote in <9A28C8860F777E439AA12E8AEA7694F80111BCEC(at)BPXM15GP(dot)gisp(dot)nec(dot)co(dot)jp>
> I expected workloads like single shot scan on a partitioned large
> fact table on DWH system. Yep, if workload is expected to rescan
> so frequently, its expected cost shall be higher (by the cost to
> launch bgworker) than existing Append, then planner will kick out
> this path.
>
> Regarding of interaction between Limit and ParallelMergeAppend,
> it is probably the best scenario, isn't it? If Limit picks up
> the least 1000rows from a partitioned table consists of 20 child
> tables, ParallelMergeAppend can launch 20 parallel jobs that
> picks up the least 1000rows from the child relations for each.
> Probably, it is same job done in pass_down_bound() of nodeLimit.c.

Yes. I confused a bit. The scenario is one of least problematic
cases.

> > As for ForeignScan, it is merely an API for FDW and does nothing
> > substantial so it would have nothing special to do. As for
> > postgres_fdw, current patch restricts one execution per one
> > foreign server at once by itself. We would have to provide
> > another execution management if we want to have two or more
> > simultaneous scans per one foreign server at once.
> >
> Yep, your 4th patch defines a new callback to FdwRoutines and
> 5th patch implements postgres_fdw specific portion.
> It shall work for distributed / shaded database environment well,
> however, its benefit is around ForeignScan only.
> Once management node kicks underlying SeqScan, ForeignScan or
> others in parallel, it also enables to run local heap scan
> asynchronously.

I suppose SeqScan don't need async kick since its startup cost is
extremely low as nothing. (fetching first several pages would
boost seqscans?) On the other hand sort/hash would be a field
where asynchronous execution is in effect.

> > Sorry for the focusless discussion but does this answer some of
> > your question?
> >
> Hmm... Its advantage is still unclear for me. However, it is not
> fair to hijack this thread by my idea.

It would be more advantageous if join/sort pushdown on fdw comes,
where start-up cost could be extremely high...

> I'll submit my design proposal about ParallelAppend towards the
> next commit-fest. Please comment on.

Ok, I'll come there.

> > > Expected waste of CPU or I/O is common problem to be solved, however, it does
> > > not need to add a special case handling to ForeignScan, I think.
> > > How about your opinion?
> >
> > I agree with you that ForeignScan as the wrapper for FDWs don't
> > need anything special for the case. I suppose for now that
> > avoiding the penalty from abandoning too many speculatively
> > executed scans (or other works on bg worker like sorts) would be
> > a business of the upper node of FDWs, or somewhere else.
> >
> > However, I haven't dismissed the possibility that some common
> > works related to resource management could be integrated into
> > executor (or even into planner), but I see none for now.
> >
> I also agree with it is "eventually" needed, but may not be supported
> in the first version.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-07-24 06:27:13 Re: WAL logging problem in 9.4.3?
Previous Message Fabien COELHO 2015-07-24 05:39:16 Re: pgbench - allow backslash-continuations in custom scripts