Re: parallelize queries containing subplans

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing subplans
Date: 2016-12-28 11:24:34
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Wed, Dec 28, 2016 at 11:47 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Currently, queries that have references to SubPlans or
> AlternativeSubPlans are considered parallel-restricted. I think we
> can lift this restriction in many cases especially when SubPlans are
> parallel-safe. To make this work, we need to propagate the
> parallel-safety information from path node to plan node and the same
> could be easily done while creating a plan. Another option could be
> that instead of propagating parallel-safety information from path to
> plan, we can find out from the plan if it is parallel-safe (doesn't
> contain any parallel-aware node) by traversing whole plan tree, but I
> think it is a waste of cycles. Once we have parallel-safety
> information in the plan, we can use that for detection of
> parallel-safe expressions in max_parallel_hazard_walker(). Finally,
> we can pass all the subplans to workers during plan serialization in
> ExecSerializePlan(). This will enable workers to execute subplans
> that are referred in parallel part of the plan. Now, we might be able
> to optimize it such that we pass only subplans that are referred in
> parallel portion of plan, but I am not sure if it is worth the trouble
> because it is one-time cost and much lesser than other things we do
> (like creating
> dsm, launching workers).
> Attached patch implements the above idea. This will enable
> parallelism for queries containing un-correlated subplans, an example
> of which is as follows:
> set parallel_tuple_cost=0;
> set parallel_setup_cost=0;
> set min_parallel_relation_size=50;
> create table t1 (i int, j int, k int);
> create table t2 (i int, j int, k int);
> insert into t1 values (generate_series(1,10)*random(),
> generate_series(5,50)*random(),
> generate_series(8,80)*random());
> insert into t2 values (generate_series(4,10)*random(),
> generate_series(5,90)*random(),
> generate_series(2,10)*random());
> Plan without Patch
> -----------------------------
> postgres=# explain select * from t1 where t1.i not in (select t2.i
> from t2 where t2.i in (1,2,3));
> ---------------------------------------------------------------
> Seq Scan on t1 (cost=110.84..411.72 rows=8395 width=12)
> Filter: (NOT (hashed SubPlan 1))
> SubPlan 1
> -> Seq Scan on t2 (cost=0.00..104.50 rows=2537 width=4)
> Filter: (i = ANY ('{1,2,3}'::integer[]))
> (5 rows)
> Plan with Patch
> ------------------------
> postgres=# explain select * from t1 where t1.i not in (select t2.i
> from t2 where t2.i in (1,2,3));
> -------------------------------------------------------------------------
> Gather (cost=110.84..325.30 rows=8395 width=12)
> Workers Planned: 1
> -> Parallel Seq Scan on t1 (cost=110.84..325.30 rows=4938 width=12)
> Filter: (NOT (hashed SubPlan 1))
> SubPlan 1
> -> Seq Scan on t2 (cost=0.00..104.50 rows=2537 width=4)
> Filter: (i = ANY ('{1,2,3}'::integer[]))
> (7 rows)
> We have observed that Q-16 in TPC-H have been improved with the patch
> and the analysis of same will be shared by my colleague Rafia.
To study the effect of uncorrelated sub-plan pushdown on TPC-H and
TPC-DS benchmark queries we performed some experiments and the
execution time results for same are summarised as follows,

Query | HEAD | Patches | scale-factor
DS-Q45 | 35 | 19 | scale-factor: 100
H-Q16 | 812 | 645 | scale-factor: 300
H-Q16 | 49 | 37 | scale-factor: 20

Execution time given in above table is in seconds. Detailed analysis
of this experimentation is as follows,
Additional patches applied in this analysis are,
Parallel index scan [1]
Parallel index-only scan [2]
Parallel merge-join [3]
The system setup used for this experiment is,

Server parameter settings:
work_mem = 500 MB,
max_parallel_workers_per_gather = 4,
random_page_cost = seq_page_cost = 0.1 = parallel_tuple_cost,
shared_buffers = 1 GB
Machine used: IBM Power, 4 socket machine, 512 GB RAM

TPC-DS scale factor = 100 (approx size of database is 150 GB)

Query 45 which takes around 35 seconds on head, completes in 19
seconds with these patches. The point to note here is that without
this patch of pushing uncorrelated sublans, hash join which is using
subplan in join filter could not be pushed to workers and hence query
was unable to use the parallelism enough, with this patch parallelism
is available till the topmost join node. The output of explain analyse
statement of this query on both head and with patches is given in
attached file ds_q45.txt.

On further evaluating these patches on TPC-H queries on different
scale factors we came across query 16, for TPC-H scale factor 20 and
aforementioned parameter settings with the change of
max_parallel_workers_per gather = 2, it took 37 seconds with the
patches and 49 seconds on head. Though the improvement in overall
query performance is not appearing to be significantly high, yet the
point to note here is that the time taken by join was around 26
seconds on head which reduced to 14 seconds with the patches. Overall
benefit in performance is not high because sort node is dominating the
execution time. The plan information of this query is given in
attached file h_q16_20_2.txt.

On increasing the scale factor to 300, setting work_mem to 1GB,
increasing max_parallel_workers_per_gather = 6, and disabling the
parallel sequential scan for supplier table by 'alter table supplier
set (parallel_workers = 0)', Q16 completes in 645 seconds with
patches, which was taking 812 seconds on head. We need to disable
parallel_workers for supplier table because on higher worker count it
was taking parallel seq scan and hence the scan node that is using
subplan could not be parallelised. For this query both pushdown of
subplans and parallel merge-join, without any one of these the
benefits of parallelism might not be leveraged fully. The output of
explain analyse for this query is given in h_q16_300.txt

Overall, with pushdown of uncorrelated sub-plans to workers enables
the parallelism in joins which was restricted before and hence good
improvement in query performance can be witnessed.

> Now, we can further extend this to parallelize queries containing
> correlated subplans like below:
> explain select * from t1 where t1.i in (select t2.i from t2 where t2.i=t1.i);
> -------------------------------------------------------------
> Seq Scan on t1 (cost=0.00..831049.09 rows=8395 width=12)
> Filter: (SubPlan 1)
> SubPlan 1
> -> Seq Scan on t2 (cost=0.00..97.73 rows=493 width=4)
> Filter: (i = t1.i)
> (5 rows)
As per my analysis this extension to correlated subplans is likely to
improve the performance of following queries -- Q2 in TPC-H and Q6 in

> Yet, another useful enhancement in this area could be to consider both
> parallel and non-parallel paths for subplans. As of now, we consider
> the cheapest/best path and form subplan from it, but it is quite
> possible that instead of choosing parallel path (in case it is
> cheapest) at subplan level, the non-parallel path at subplan level
> could be beneficial when upper plan can use parallelism. I think this
> will be a separate project in itself if we want to do this and based
> on my study of TPC-H and TPC-DS queries, I am confident that this will
> be helpful in certain queries at higher scale factors.
I agree as then we do not need to disable parallelism for particular
relations as we currently do for supplier relation in Q16 of TPC-H.


Rafia Sabih

Attachment Content-Type Size
ds_q45.txt text/plain 8.7 KB
h_q16_20_2.txt text/plain 5.8 KB
h_q16_300.txt text/plain 5.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-12-28 11:50:50 parallelize queries containing initplans
Previous Message Craig Ringer 2016-12-28 10:00:37 Re: [PATCH] Transaction traceability - txid_status(bigint)