Re: Introducing coarse grain parallelism by postgres_fdw.

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Introducing coarse grain parallelism by postgres_fdw.
Date: 2014-08-08 06:23:13
Message-ID: CAFjFpRfdt+kUN5HKR8PhPhsOZBGe1zQYoubmQ9CTVKkDLHqtwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 8, 2014 at 8:54 AM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Hi, thank you for the comment.
>
> > Hi Kyotaro,
> > I looked at the patches and felt that the approach taken here is too
> > intrusive, considering that the feature is only for foreign scans.
>
> I agree to you premising that it's only for foreign scans but I
> regard it as an example of parallel execution planning.
>
> > There are quite a few members added to the generic Path, Plan structures,
> > whose use is is induced only through foreign scans. Each path now stores
> > two sets of costs, one with parallelism and one without. The parallel
> > values will make sense only when there is a foreign scan, which uses
> > parallelism, in the plan tree. So, those costs are maintained
> unnecessarily
> > or the memory for those members is wasted in most of the cases, where the
> > tables involved are not foreign. Also, not many foreign tables will be
> able
> > to use the parallelism, e.g. file_fdw. Although, that's my opinion; I
> would
> > like hear from others.
>
> I intended to discuss what the estimation and planning for
> parallel exexution (not limited to foreign scan) would be
> like. Backgroud worker would be able to take on executing some
> portion of path tree in 'parallel'. The postgres_fdw for this
> patch is simply a case in planning of parallel
> executions. Although, as you see, it does only choosing whether
> to go parallel for the path constructed regardless of parallel
> execution but thinking of the possible alternate paths of
> parallel execution will cost too much.
>
> Limiting to parallel scans for this discussion, the overall gain
> by multiple simultaneous scans distributed in path/plan tree
> won't be known before cost counting is done up to the root node
> (more precisely the common parent of them). This patch foolishly
> does bucket brigade of parallel cost up to root node, but there
> should be smarter way to shortcut it, for example, simplly
> picking up parallelizable nodes by scanning completed path/plan
> tree and calculate the probably-eliminable costs from them, then
> subtract it from or compare to the total (nonparallel) cost. This
> might be more acceptable for everyone than current implement.
>
>
Planning for parallel execution, would be a much harder problem to solve.
Just to give a glimpse, how many worker backends can be spawned depends
entirely on the load at the time of execution. For prepared queries, the
load condition can change between planning and execution and thus the
number of parallel backends, which would decide the actual time of
execution and hence cost, can not be estimated at the time of the planning.
Mixing this that parallelism with FDW's parallelism would make things even
more complicated. I think those two problems are to be solved in different
ways.

> > Instead, an FDW which can use parallelism can add two paths one with and
> > one without parallelism with appropriate costs and let the logic choosing
> > the cheapest path take care of the actual choice. In fact, I thought,
> > parallelism would be always faster than the non-parallel one, except when
> > the foreign server is too much loaded. But we won't be able to check that
> > anyway. Can you point out a case where the parallelism may not win over
> > serial execution?
>
> It always wins against serial execution if parallel execution can
> launched with no extra cost. But actually it costs extra resource
> so I thought that parallel execution should be curbed for small
> gain. It's the two GUCs added by this patch and what
> choose_parallel_scans() does, although in non-automated way. The
> overloading issue is not a matter confined to parallel execution
> but surely it will be more severe since it is less visible and
> controllable from users. However, it anyhow would should go to
> manual configuration at end.
>

I am not sure, whether the way this patch provides manual control is really
effective or in-effective without understanding the full impact. Do we have
any numbers to show the cases, when parallelism would effective and when it
would not and how those GUCs help choose the effective one?

>
> > BTW, the name parallelism seems to be misleading here. All, it will be
> able
> > to do is fire the queries (or data fetch requests) asynchronously. So, we
> > might want to change the naming appropriately.
>
> It is right ragarding what I did exactly to postgres_fdw. But not
> allowing all intermedate tuples from child execution nodes in
> parallel to be piled up on memory without restriction, I suppose
> all 'parallel' execution to be a kind of this 'asynchronous'
> startup/fething. As for postgres_fdw, it would look more like
> 'parallel' (and perhaps more effeicient) by processing queries
> using libpq's single-row mode instead of a cursor but the similar
> processing takes place under system calls even for the case.
>
>
By single mode, do you mean executing FETCH for every row? That wouldn't be
efficient, since each row will then incur messaging cost between local and
foreign server, which can not be neglected for libpq at least.

>
> Well, I will try to make the version not including parallel costs
> in plan/path structs, and single-row mode for postgres_fdw. I
> hope it will go towards anything.
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2014-08-08 06:27:51 Re: jsonb format is pessimal for toast compression
Previous Message Fujii Masao 2014-08-08 06:11:41 Re: postgresql.auto.conf and reload