Re: TPC-H Q20 from 1 hour to 19 hours!

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-31 11:43:36
Message-ID: CAA4eK1+8HY0v4JS204zY_huAJNDrS85eTi250DdcAyyygjYS_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 30, 2017 at 8:24 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Mar 29, 2017 at 8:00 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> What is however strange is that changing max_parallel_workers_per_gather
>> affects row estimates *above* the Gather node. That seems a bit, um,
>> suspicious, no? See the parallel-estimates.log.
>
> Thanks for looking at this! Comparing the parallel plan vs. the
> non-parallel plan:
>
> part: parallel rows (after Gather) 20202, non-parallel rows 20202
> partsupp: parallel rows 18, non-parallel rows 18
> part-partsupp join: parallel rows 88988, non-parallel rows 355951
> lineitem: parallel rows 59986112, non-parallel rows 59986112
> lineitem after aggregation: parallel rows 5998611, non-parallel rows 5998611
> final join: parallel rows 131, non-parallel rows 524
>
> I agree with you that that looks mighty suspicious. Both the
> part-partsupp join and the final join have exactly 4x as many
> estimated rows in the non-parallel plan as in the parallel plan, and
> it just so happens that the parallel divisor here will be 4.
>
> Hmm... it looks like the parallel_workers value from the Gather node
> is being erroneously propagated up to the higher levels of the plan
> tree. Wow. Somehow, Gather Merge managed to get the logic correct
> here, but Gather is totally wrong. Argh. Attached is a draft patch,
> which I haven't really tested beyond checking that it passes 'make
> check'.
>

Your patch looks good to me. I have verified some join cases as well
where the behaviour is sane after patch. I have also done testing
with force_parallel_mode=regress (ran make check-world) and everything
seems good.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2017-03-31 11:50:44 Re: Variable substitution in psql backtick expansion
Previous Message Rafia Sabih 2017-03-31 11:06:23 Re: Parallel query execution with SPI