Re: [DESIGN] ParallelAppend

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: [DESIGN] ParallelAppend
Date: 2015-07-27 12:09:54
Message-ID: CAA4eK1LP_rdJC-Mc-4JJJAjciq2UJR7Jj_HiKdEverANtd=imA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>
> Hello,
>
> I'm recently working/investigating on ParallelAppend feature
> towards the next commit fest. Below is my design proposal.
>
> 1. Concept
> ----------
> Its concept is quite simple anybody might consider more than once.
> ParallelAppend node kicks background worker process to execute
> child nodes in parallel / asynchronous.
> It intends to improve the performance to scan a large partitioned
> tables from standpoint of entire throughput, however, latency of
> the first multi-hundred rows are not scope of this project.
> From standpoint of technology trend, it primarily tries to utilize
> multi-cores capability within a system, but also enables to expand
> distributed database environment using foreign-tables inheritance
> features.
> Its behavior is very similar to Funnel node except for several
> points, thus, we can reuse its infrastructure we have had long-
> standing discussion through the v9.5 development cycle.
>
> 2. Problems to be solved
> -------------------------
> Typical OLAP workloads takes tons of tables join and scan on large
> tables which are often partitioned, and its KPI is query response
> time but very small number of sessions are active simultaneously.
> So, we are required to run a single query as rapid as possible even
> if it consumes larger computing resources than typical OLTP workloads.
>
> Current implementation to scan heap is painful when we look at its
> behavior from the standpoint - how many rows we can read within a
> certain time, because of synchronous manner.
> In the worst case, when SeqScan node tries to fetch the next tuple,
> heap_getnext() looks up a block on shared buffer, then ReadBuffer()
> calls storage manager to read the target block from the filesystem
> if not on the buffer. Next, operating system makes the caller
> process slept until required i/o get completed.
> Most of the cases are helped in earlier stage than the above worst
> case, however, the best scenario we can expect is: the next tuple
> already appear on top of the message queue (of course visibility
> checks are already done also) with no fall down to buffer manager
> or deeper.
> If we can run multiple scans in parallel / asynchronous, CPU core
> shall be assigned to another process by operating system, thus,
> it eventually improves the i/o density and enables higher processing
> throughput.
> Append node is an ideal point to be parallelized because
> - child nodes can have physically different location by tablespace,
> so further tuning is possible according to the system landscape.
> - it can control whether subplan is actually executed on background
> worker, per subplan basis. If subplan contains large tables and
> small tables, ParallelAppend may kick background worker to scan
> large tables only, but scan on small tables are by itself.
> - Like as Funnel node, we don't need to care about enhancement of
> individual node types. SeqScan, IndexScan, ForeignScan or others
> can perform as usual, but actually in parallel.
>
>
> 3. Implementation
> ------------------
> * Plan & Cost
>
> ParallelAppend shall appear where Appen can appear except for the
> usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
> both of AppendPath and ParallelAppendPath with cost for each.
>

Is there a real need to have new node like ParallelAppendPath?
Can't we have Funnel node beneath AppendNode and then each
worker will be responsible to have SeqScan on each inherited child
relation. Something like

Append
---> Funnel
--> SeqScan rel1
--> SeqScan rel2

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-07-27 12:43:15 9.5a1 BUG FIX: pgbench negative latencies
Previous Message Alexander Korotkov 2015-07-27 12:04:38 Re: Proposal for CSN based snapshots