Re: parallel joins, and better parallel explain

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel joins, and better parallel explain
Date: 2016-01-05 01:52:10
Message-ID: CAFiTN-s1aBjTZdmNih_noiPm6TUQOV=soLgxcxKdGKjsq+-ZzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 5, 2016 at 1:52 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Jan 4, 2016 at 4:50 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > I tried to create a inner table such that, inner table data don't fit in
> RAM
> > (I created VM with 1GB Ram).
> > Purpose of this is to make Disk scan dominant,
> > and since parallel join is repeating the Disk Scan and hash table
> building
> > of inner table, so there will be lot of Parallel I/O and it has to pay
> > penalty.
> >
> > I think even though, inner table scanning and hash table building is
> > parallel, but there will be lot of parallel I/O which will become
> > bottleneck.
>
> Hmm. Because only 1/1024th of the hash table fits in work_mem, the
> executor is going to have to write out all of the tuples that don't
> belong to the first batch to a temporary file and then read them back
> in. So each backend is going to write essentially the entirety of t2
> out to disk and then read it all back in again. The non-parallel case
> will also write most of the table contents and then read them back in,
> but at least it will only be doing that once rather than 7 times, so
> it's not as bad. Also, with fewer backends running, the non-parallel
> case will have a bit more memory free for caching purposes.
>
> > Do we need to consider the cost for parallel i/o also, i am not sure can
> we
> > really do that... ?
>
> It seems to me that the problem here is that you've set
> max_parallel_degree to an unrealistically high value. The query
> planner is entitled to assume that the user has set
> max_parallel_degree to a value which is small enough that the workers
> won't be fighting too viciously with each other over resources. It
> doesn't really matter whether those resources are CPU resources or I/O
> resources. I'm wondering if your 1GB VM really even has as many as 7
> vCPUs, because that would seem to be something of an unusual
> configuration - and if it doesn't, then setting max_parallel_degree to
> a value that high is certainly user error. Even if it does, it's still
> not right to set the value as high as six unless the system also has
> enough I/O bandwidth to accommodate the amount of I/O that you expect
> your queries to generate, and here it seems like it probably doesn't.
>
> To put that another way, you can always make parallel query perform
> badly by telling it to use too many workers relative to the size of
> the machine you have. This is no different than getting bad query
> plans by configuring work_mem or effective_cache_size or any other
> query planner GUC to a value that doesn't reflect the actual execution
> environment. I would only consider this to be a problem with the
> parallel join patch if the chosen plan is slower even on a machine
> that's big enough to justify setting max_parallel_degree=6 in the
> first place.
>
>
Yes, it's valid point... I have configured 6Processor for the virtual
machine but that will be with HT.
So this time i have configured 8 processor and taken performance again
with less number of parallel degree.

Even though with less paralllel degree there is some regression, but still
as you mentioned there can be some other limitation like i am configuring
Disk of 50GB and filling 20GB with data.

I think you are right, before coming to any conclusion, we need to test on
really high end machine where machine itself don't have any resource
constraint.

In 1GB RAM
8Processor VM ( Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz) --> This machine
i7, so i doubt it's really using 8 cores, so i tested with less parallel
degree.
SSD: 50GB

postgres=# set max_parallel_degree=3;
postgres=# explain analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1
AND t2.c2 + t1.c1 > 100;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7920946.47..7920946.48 rows=1 width=0) (actual
time=162329.829..162329.829 rows=1 loops=1)
-> Gather (cost=1527963.25..7880946.39 rows=16000033 width=0) (actual
time=58233.106..159140.629 rows=47999950 loops=1)
Number of Workers: 3
-> Hash Join (cost=1526963.25..6279943.09 rows=16000033 width=0)
(actual time=58346.087..144309.987 rows=11999988 loops=4)
Hash Cond: (t1.c1 = t2.c1)
Join Filter: ((t2.c2 + t1.c1) > 100)
Rows Removed by Join Filter: 12
-> Parallel Seq Scan on t1 (cost=0.00..2064959.01
rows=32259701 width=4) (actual time=98.514..27003.566 rows=25000000 loops=4)
-> Hash (cost=739461.00..739461.00 rows=48000100 width=8)
(actual time=58012.228..58012.228 rows=48000000 loops=4)
Buckets: 131072 Batches: 1024 Memory Usage: 2856kB
-> Seq Scan on t2 (cost=0.0po0..739461.00
rows=48000100 width=8) (actual time=3.524..9634.181 rows=48000000 loops=4)
Planning time: 1.945 ms
Execution time: 162330.657 ms

postgres=# set
max_parallel_degree=2;

SET
postgres=# explain analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1
AND t2.c2 + t1.c1 > 100;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8744354.81..8744354.82 rows=1 width=0) (actual
time=133715.245..133715.245 rows=1 loops=1)
-> Gather (cost=1527963.25..8704354.73 rows=16000033 width=0) (actual
time=49240.892..130699.685 rows=47999950 loops=1)
Number of Workers: 2
-> Hash Join (cost=1526963.25..7103351.43 rows=16000033 width=0)
(actual time=48916.074..116934.088 rows=15999983 loops=3)
Hash Cond: (t1.c1 = t2.c1)
Join Filter: ((t2.c2 + t1.c1) > 100)
Rows Removed by Join Filter: 17
-> Parallel Seq Scan on t1 (cost=0.00..2159049.80
rows=41668780 width=4) (actual time=106.882..22650.646 rows=33333333
loops=3)
-> Hash (cost=739461.00..739461.00 rows=48000100 width=8)
(actual time=48670.370..48670.370 rows=48000000 loops=3)
Buckets: 131072 Batches: 1024 Memory Usage: 2856kB
-> Seq Scan on t2 (cost=0.00..739461.00
rows=48000100 width=8) (actual time=0.618..7908.589 rows=48000000 loops=3)
Planning time: 0.380 ms
Execution time: 133715.932 ms
(13 rows)

postgres=# set max_parallel_degree=0;
SET
postgres=# explain analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1
AND t2.c2 + t1.c1 > 100;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12248485.55..12248485.56 rows=1 width=0) (actual
time=92297.234..92297.234 rows=1 loops=1)
-> Hash Join (cost=1526963.25..12208485.47 rows=16000033 width=0)
(actual time=15739.911..89627.652 rows=47999950 loops=1)
Hash Cond: (t1.c1 = t2.c1)
Join Filter: ((t2.c2 + t1.c1) > 100)
Rows Removed by Join Filter: 50
-> Seq Scan on t1 (cost=0.00..2742412.72 rows=100005072 width=4)
(actual time=127.260..24826.175 rows=100000000 loops=1)
-> Hash (cost=739461.00..739461.00 rows=48000100 width=8)
(actual time=15560.002..15560.002 rows=48000000 loops=1)
Buckets: 131072 Batches: 1024 Memory Usage: 2856kB
-> Seq Scan on t2 (cost=0.00..739461.00 rows=48000100
width=8) (actual time=0.834..6199.727 rows=48000000 loops=1)
Planning time: 0.244 ms
Execution time: 92298.000 ms
(11 rows)

One strange behaviour, after increasing number of processor for VM,
max_parallel_degree=0; is also performing better.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-01-05 04:24:57 Re: Accessing non catalog table in backend
Previous Message Tom Lane 2016-01-05 01:13:38 Re: Making tab-complete.c easier to maintain