Re: [DESIGN] ParallelAppend

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: [DESIGN] ParallelAppend
Date: 2015-08-19 03:27:11
Message-ID: CAA4eK1LS2NO58LodfXrFdSAOzn_O6MyKgGQLYdc5UC+Qein90g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 13, 2015 at 5:26 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:

> > On Fri, Aug 7, 2015 at 2:15 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> wrote:
> > >
> >
> > Sure, that is what we should do, however the tricky part would be when
> > the path for doing local scan is extremely cheaper than path for parallel
> > scan for one of the child nodes. For such cases, pulling up Funnel-node
> > can incur more cost. I think some of the other possible ways to make
> this
> > work could be to extend Funnel so that it is capable of executing both
> parallel
> > and non-parallel nodes, have a new Funnel like node which has such a
> > capability.
> >
> I think it is job of (more intelligent) planner but not in the first
> version. If subplans of Append are mixture of nodes which has or does
> not have worth of parallel execution, we will be able to arrange the
> original form:
>
> Append
> + Scan on rel1 (large)
> + Scan on rel2 (large)
> + Scan on rel3 (middle)
> + Scan on rel4 (tiny)
> + Scan on rel5 (tiny)
>
> to Funnel aware form, but partially:
>
> Append
> + Funnel
> | + Scan on rel1 (large)
> | + Scan on rel2 (large)
> | + Scan on rel3 (large)
> + Scan on rel4 (tiny)
> + Scan on rel5 (tiny)
>
>
This is exactly what I have in mind.

>
> Here is one other issue I found. Existing code assumes a TOC segment has
> only one contents per node type, so it uses pre-defined key (like
> PARALLEL_KEY_SCAN) per node type, however, it is problematic if we put
> multiple PlannedStmt or PartialSeqScan node on a TOC segment.
>

We have few keys in parallel-seq-scan patch
(PARALLEL_KEY_TUPLE_QUEUE and PARALLEL_KEY_INST_INFO) for
which multiple structures are shared between master and worker backends.

Check if something similar can work for your use case.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2015-08-19 03:42:45 Re: Proposal: Implement failover on libpq connect level.
Previous Message Tom Lane 2015-08-19 03:23:32 Re: More WITH