Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-05-22 09:46:36
Message-ID: CAA4eK1JUnd976LDdGfhb+d+A-KatrQc==3gDoXtLNNW-_wNgvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 23, 2015 at 2:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Apr 22, 2015 at 8:48 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > I have implemented this idea (note that I have to expose a new API
> > shm_mq_from_handle as TupleQueueFunnel stores shm_mq_handle* and
> > we sum_mq* to call shm_mq_detach) and apart this I have fixed other
> > problems reported on this thread:
> >
> > 1. Execution of initPlan by master backend and then pass the
> > required PARAM_EXEC parameter values to workers.
> > 2. Avoid consuming dsm's by freeing the parallel context after
> > the last tuple is fetched.
> > 3. Allow execution of Result node in worker backend as that can
> > be added as a gating filter on top of PartialSeqScan.
> > 4. Merged parallel heap scan descriptor patch
> >
> > To apply the patch, please follow below sequence:
> >
> > HEAD Commit-Id: 4d930eee
> > parallel-mode-v9.patch [1]
> > assess-parallel-safety-v4.patch [2] (don't forget to run fixpgproc.pl
in
> > the patch)
> > parallel_seqscan_v14.patch (Attached with this mail)
>
> Thanks, this version looks like an improvement. However, I still see
> some problems:
>
> - I believe the separation of concerns between ExecFunnel() and
> ExecEndFunnel() is not quite right. If the scan is shut down before
> it runs to completion (e.g. because of LIMIT), then I think we'll call
> ExecEndFunnel() before ExecFunnel() hits the TupIsNull(slot) path. I
> think you probably need to create a static subroutine that is called
> both as soon as TupIsNull(slot) and also from ExecEndFunnel(), in each
> case cleaning up whatever resources remain.
>

Okay, added new routine FinishParallelSetupAndAccumStats() which
will be called both from ExecEndFunnel() and when ExecFunnel() hits
the TupIsNull(slot) path. Apart from that the same routine is called
from some other paths like rescan and when we need to collect
statistics after execution is complete but still ExecEndFunnel() is
not called. This routine ensures that once it has collected the
stats of parallel workers and destroyed the parallel context, it will
do nothing on next execution unless the node is re-initialized.

> - InitializeParallelWorkers() still mixes together general parallel
> executor concerns with concerns specific to parallel sequential scan
> (e.g. EstimatePartialSeqScanSpace). We have to eliminate everything
> that assumes that what's under a funnel will be, specifically, a
> partial sequential scan.
>

Okay, introduced the new function planstate_tree_walker(), so that
it can work for anything below funnel node.

>
> - shm_mq_from_handle() is probably reasonable, but can we rename it
> shm_mq_get_queue()?
>

Changed as per suggestion.

> - It's hard to believe this is right:
>
> + if (parallelstmt->inst_options)
> + receiver = None_Receiver;
>
> Really? Flush the tuples if there are *any instrumentation options
> whatsoever*? At the very least, that doesn't look too future-proof,
> but I'm suspicious that it's outright incorrect.
>

You are right, I have removed this part of code.

> - I think ParallelStmt probably shouldn't be defined in parsenodes.h.
> That file is included in a lot of places, and adding all of those
> extra #includes there doesn't seem like a good idea for modularity
> reasons even if you don't care about partial rebuilds. Something that
> includes a shm_mq obviously isn't a "parse" node in any meaningful
> sense anyway.
>

Changed postmaster/backendworkers.c to executor/execParallel.c
and moved ParallelStmt to executor/execParallel.h

> - I don't think you need both setup cost and startup cost. Starting
> up more workers isn't particularly more expensive than starting up
> fewer of them, because most of the overhead is in waiting for them to
> actually start, and the number of workers is reasonable, then they're
> all be doing that in parallel with each other. I suggest removing
> parallel_startup_cost and keeping parallel_setup_cost.
>

As per discussion, it makes sense to remove parallel_startup_cost.

> - In cost_funnel(), I don't think it's right to divide the run cost by
> nWorkers + 1. Suppose we've got a plan that looks like this:
>
> Funnel
> -> Hash Join
> -> Partial Seq Scan on a
> -> Hash
> -> Seq Scan on b
>
> The sequential scan on b is going to get executed once per worker,
> whereas the effort for the sequential scan on a is going to be divided
> over all the workers. So the right way to cost this is as follows:
>
> (a) The cost of the partial sequential scan on a is equal to the cost
> of a regular sequential scan, plus a little bit of overhead to account
> for communication via the ParallelHeapScanDesc, divided by the number
> of workers + 1.
> (b) The cost of the remaining nodes under the funnel works normally.
> (c) The cost of the funnel is equal to the cost of the hash join plus
> number of tuples multiplied by per-tuple communication overhead plus a
> large fixed overhead reflecting the time it takes the workers to
> start.
>

Okay, changed as per suggestion.

> - While create_parallelscan_paths() is quite right to limit the number
> of workers to no more than the number of pages, it's pretty obvious
> that in practice that's way too conservative. I suggest we get
> significantly more aggressive about that, like limiting ourselves to
> one worker per thousand pages. We don't really know exactly what the
> costing factors should be here just yet, but we certainly know that
> spinning up lots of workers to read a handful of pages each must be
> dumb. And we can save a significant amount of planning time here by
> not bothering to generate parallel paths for little tiny relations.
>

Right, I have changed as per suggestion, but now it will only choose
the parallel path for bigger relations, so to test with smaller relations
one way is to reduce the cpu_tuple_comm_cost.

Note - You need to apply assess-parallel-safety-v5.patch (posted by
Robert on thread assessing parallel-safety) before this patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_seqscan_v15.patch application/octet-stream 156.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-05-22 09:59:44 Re: [COMMITTERS] pgsql: At promotion, archive last segment from old timeline with .parti
Previous Message Shigeru Hanada 2015-05-22 09:37:58 Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)