Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-08 11:42:59
Message-ID: CAA4eK1KLyPUz9MVz7FubM0W6ANSk+2mnCePLr7AUXW1iN0YNtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 5, 2015 at 8:31 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Jan 2, 2015 at 5:36 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > On Thu, Jan 1, 2015 at 11:29 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
wrote:
> >> On Thu, Jan 1, 2015 at 12:00 PM, Fabrízio de Royes Mello
> >> <fabriziomello(at)gmail(dot)com> wrote:
> >> > Can we check the number of free bgworkers slots to set the max
workers?
> >>
> >> The real solution here is that this patch can't throw an error if it's
> >> unable to obtain the desired number of background workers. It needs
> >> to be able to smoothly degrade to a smaller number of background
> >> workers, or none at all.
> >
> > I think handling this way can have one side effect which is that if
> > we degrade to smaller number, then the cost of plan (which was
> > decided by optimizer based on number of parallel workers) could
> > be more than non-parallel scan.
> > Ideally before finalizing the parallel plan we should reserve the
> > bgworkers required to execute that plan, but I think as of now
> > we can workout a solution without it.
>
> I don't think this is very practical. When cached plans are in use,
> we can have a bunch of plans sitting around that may or may not get
> reused at some point in the future, possibly far in the future. The
> current situation, which I think we want to maintain, is that such
> plans hold no execution-time resources (e.g. locks) and, generally,
> don't interfere with other things people might want to execute on the
> system. Nailing down a bunch of background workers just in case we
> might want to use them in the future would be pretty unfriendly.
>
> I think it's right to view this in the same way we view work_mem. We
> plan on the assumption that an amount of memory equal to work_mem will
> be available at execution time, without actually reserving it.

Are we sure that in such cases we will consume work_mem during
execution? In cases of parallel_workers we are sure to an extent
that if we reserve the workers then we will use it during execution.
Nonetheless, I have proceded and integrated the parallel_seq scan
patch with v0.3 of parallel_mode patch posted by you at below link:
http://www.postgresql.org/message-id/CA+TgmoYmp_=XcJEhvJZt9P8drBgW-pDpjHxBhZA79+M4o-CZQA@mail.gmail.com

Few things to note about this integrated patch are:
1. In this new patch, I have just integrated it with Robert's parallel_mode
patch and not done any further development or fixed known things
like changes in optimizer, prepare queries, etc. You might notice
that new patch has lesser size as compare to previous patch and the
reason is that there were some duplicate stuff between previous
version of parallel_seqscan patch and parallel_mode which I have
eliminated.

2. To enable two types of shared memory queue's (error queue and
tuple queue), we need to ensure that we switch to appropriate queue
during communication of various messages from parallel worker
to master backend. There are two ways to do it
a. Save the information about error queue during startup of parallel
worker (ParallelMain()) and then during error, set the same (switch
to error queue in errstart() and switch back to tuple queue in
errfinish() and errstart() in case errstart() doesn't need to
propagate
error).
b. Do something similar as (a) for tuple queue in printtup or other
place
if any for non-error messages.
I think approach (a) is slightly better as compare to approach (b) as
we need to switch many times for tuple queue (for each tuple) and
there could be multiple places where we need to do the same. For now,
I have used approach (a) in Patch which needs some more work if we
agree on the same.

3. As per current implementation of Parallel_seqscan, it needs to use
some information from parallel.c which was not exposed, so I have
exposed the same by moving it to parallel.h. Information that is required
is as follows:
ParallelWorkerNumber, FixedParallelState and shm keys -
This is used to decide the blocks that needs to be scanned.
We might change it in future the way parallel scan/work distribution
is done, but I don't see any harm in exposing this information.

4. Sending ReadyForQuery

> If the
> plan happens to need that amount of memory and if it actually isn't
> available when needed, then performance will suck; conceivably, the
> OOM killer might trigger. But it's the user's job to avoid this by
> not setting work_mem too high in the first place. Whether this system
> is for the best is arguable: one can certainly imagine a system where,
> if there's not enough memory at execution time, we consider
> alternatives like (a) replanning with a lower memory target, (b)
> waiting until more memory is available, or (c) failing outright in
> lieu of driving the machine into swap. But devising such a system is
> complicated -- for example, replanning with a lower memory target
> might be latch onto a far more expensive plan, such that we would have
> been better off waiting for more memory to be available; yet trying to
> waiting until more memory is available might result in waiting
> forever. And that's why we don't have such a system.
>
> We don't need to do any better here. The GUC should tell us how many
> parallel workers we should anticipate being able to obtain. If other
> settings on the system, or the overall system load, preclude us from
> obtaining that number of parallel workers, then the query will take
> longer to execute; and the plan might be sub-optimal. If that happens
> frequently, the user should lower the planner GUC to a level that
> reflects the resources actually likely to be available at execution
> time.
>

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-01-08 11:46:30 Re: List of table names of a DB
Previous Message David Rowley 2015-01-08 11:31:45 Re: Removing INNER JOINs