Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-03-04 00:47:14
Message-ID: CAA4eK1KsBje1A8=v-mdyD982452Q8nvxhLEY6_KvDwuA0BqBUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 22, 2015 at 6:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Feb 17, 2015 at 11:22 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > My only "problem" with that description is that I think workers will
> > have to work on more than one node - it'll be entire subtrees of the
> > executor tree.
>
> Amit and I had a long discussion about this on Friday while in Boston
> together. I previously argued that the master and the slave should be
> executing the same node, ParallelSeqScan. However, Amit argued
> persuasively that what the master is doing is really pretty different
> from what the worker is doing, and that they really ought to be
> running two different nodes. This led us to cast about for a better
> design, and we came up with something that I think will be much
> better.
>
> The basic idea is to introduce a new node called Funnel. A Funnel
> node will have a left child but no right child, and its job will be to
> fire up a given number of workers. Each worker will execute the plan
> which is the left child of the funnel. The funnel node itself will
> pull tuples from all of those workers, and can also (if there are no
> tuples available from any worker) execute the plan itself.

I have modified the patch to introduce a Funnel node (and left child
as PartialSeqScan node). Apart from that, some other noticeable
changes based on feedback include:
a) Master backend forms and send the planned stmt to each worker,
earlier patch use to send individual elements and form the planned
stmt in each worker.
b) Passed tuples directly via tuple queue instead of going via
FE-BE protocol.
c) Removed restriction of expressions in target list.
d) Introduced a parallelmodeneeded flag in plannerglobal structure
and set it for Funnel plan.

There is still some work left like integrating with
access-parallel-safety patch (use parallelmodeok flag to decide
whether parallel path can be generated, Enter/Exit parallel mode is still
done during execution of funnel node).

I think these are minor points which can be fixed once we decide
on the other major parts of patch. Find modified patch attached with
this mail.

Note -
This patch is based on Head (commit-id: d1479011) +
parallel-mode-v6.patch [1] + parallel-heap-scan.patch[2]

[1]
http://www.postgresql.org/message-id/CA+TgmobCMwFOz-9=hFv=hJ4SH7p=5X6Ga5V=WtT8=huzE6C+Mg@mail.gmail.com
[2]
http://www.postgresql.org/message-id/CA+TgmoYJETgeAXUsZROnA7BdtWzPtqExPJNTV1GKcaVMgSdhug@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_seqscan_v8.patch application/octet-stream 105.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-03-04 00:55:49 Re: Combining Aggregates
Previous Message Stephen Frost 2015-03-04 00:38:34 Re: Proposal: knowing detail of config files via SQL