Re: Parallel Seq Scan

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-04-08 10:00:36
Message-ID: CAApHDvoSrbWgNmaoVT1q5uF7en2jQiP0bUjdXaUAns930rjM4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 April 2015 at 15:46, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Wed, Apr 8, 2015 at 1:53 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> >
> > David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> >
> > > If we attempt to do this parallel stuff at plan time, and we
> > > happen to plan at some quiet period, or perhaps worse, some
> > > application's start-up process happens to PREPARE a load of
> > > queries when the database is nice and quite, then quite possibly
> > > we'll end up with some highly parallel queries. Then perhaps come
> > > the time these queries are actually executed the server is very
> > > busy... Things will fall apart quite quickly due to the masses of
> > > IPC and context switches that would be going on.
> > >
> > > I completely understand that this parallel query stuff is all
> > > quite new to us all and we're likely still trying to nail down
> > > the correct infrastructure for it to work well, so this is why
> > > I'm proposing that the planner should know nothing of parallel
> > > query, instead I think it should work more along the lines of:
> > >
> > > * Planner should be completely oblivious to what parallel query
> > > is.
> > > * Before executor startup the plan is passed to a function which
> > > decides if we should parallelise it, and does so if the plan
> > > meets the correct requirements. This should likely have a very
> > > fast exit path such as:
> > > if root node's cost < parallel_query_cost_threshold
> > > return; /* the query is not expensive enough to attempt to make
> parallel */
> > >
> > > The above check will allow us to have an almost zero overhead for
> > > small low cost queries.
> > >
> > > This function would likely also have some sort of logic in order
> > > to determine if the server has enough spare resource at the
> > > current point in time to allow queries to be parallelised
> >
> > There is a lot to like about this suggestion.
> >
> > I've seen enough performance crashes due to too many concurrent
> > processes (even when each connection can only use a single process)
> > to believe that, for a plan which will be saved, it is possible to
> > know at planning time whether parallelization will be a nice win or
> > a devastating over-saturation of resources during some later
> > execution phase.
> >
> > Another thing to consider is that this is not entirely unrelated to
> > the concept of admission control policies. Perhaps this phase
> > could be a more general execution start-up admission control phase,
> > where parallel processing would be one adjustment that could be
> > considered.
>
> I think there is always a chance that resources (like parallel-workers)
> won't be available at run-time even if we decide about them at
> executor-start phase unless we block it for that node's usage and OTOH
> if we block it (by allocating) those resources during executor-start phase
> then we might end up blocking it too early or may be they won't even get
> used if we decide not to execute that node. On that basis, it seems to
> me current strategy is not bad where we decide during planning time and
> later during execution time if not all resources (particularly
> parallel-workers)
> are not available, then we use only the available one's to execute the
> plan.
> Going forward, I think we can improve the same if we decide not to shutdown
> parallel workers till postmaster shutdown once they are started and
> then just allocate them during executor-start phase.
>
>
>
Yeah, but what about when workers are not available in cases when the plan
was only a win because the planner thought there would be lots of
workers... There could have been a more optimal serial plan already thrown
out by the planner which is no longer available to the executor.

If the planner didn't know about parallelism then we'd already have the
most optimal plan and it would be no great loss if no workers were around
to help.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2015-04-08 11:50:05 Re: Row security violation error is misleading
Previous Message Andrew Gierth 2015-04-08 09:44:26 Re: Support UPDATE table SET(*)=...