Re: parallel joins, and better parallel explain

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel joins, and better parallel explain
Date: 2016-01-07 11:34:25
Message-ID: CAFiTN-ti8PS7Ku8a63P=ePVWtjCAYvpidfD1+sEs+GAfjeJeKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 6, 2016 at 10:29 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Jan 4, 2016 at 8:52 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > One strange behaviour, after increasing number of processor for VM,
> > max_parallel_degree=0; is also performing better.
>
> So, you went from 6 vCPUs to 8? In general, adding more CPUs means
> that there is less contention for CPU time, but if you already had 6
> CPUs and nothing else running, I don't know why the backend running
> the query would have had a problem getting a whole CPU to itself. If
> you previously only had 1 or 2 CPUs then there might have been some
> CPU competition with background processes, but if you had 6 then I
> don't know why the max_parallel_degree=0 case got faster with 8.
>

I am really not sure about this case, may be CPU allocation in virtual
machine had problem.. but can't say anything

> Anyway, I humbly suggest that this query isn't the right place to put
> our attention. There's no reason why we can't improve things further
> in the future, and if it turns out that lots of people have problems
> with the cost estimates on multi-batch parallel hash joins, then we
> can revise the cost model. We wouldn't treat a single query where a
> non-parallel multi-batch hash join run slower than the costing would
> suggest as a reason to revise the cost model for that case, and I
> don't think this patch should be held to a higher standard. In this
> particular case, you can easily make the problem go away by tuning
> configuration parameters, which seems like an acceptable answer for
> people who run into this,

Yes, i agree with this point, cost model can always be improved. And anyway
in most of the queries even in TPC-H benchmark we have seen big improvement
with parallel join.

I have done further testing for observing the plan time, using TPC-H
queries and some other many table join queries(7-8 tables)..

I did not find any visible regression in planning time...

*There are many combinations of queries i have tested, and because of big
size of query and result did not attached in the mail... let me know if
anybody want to know the details of queries...

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-01-07 11:35:36 code to deparse parameter in postgres_fdw is duplicated
Previous Message Andres Freund 2016-01-07 11:23:41 Re: Relation extension scalability