Re: TPC-H Q20 from 1 hour to 19 hours!

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-30 02:54:56
Message-ID: CA+Tgmobqu8zz+ae0JSrT=bWsLdDkZqGFo=qbrvtM7t7rVTU9yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 29, 2017 at 8:00 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> What is however strange is that changing max_parallel_workers_per_gather
> affects row estimates *above* the Gather node. That seems a bit, um,
> suspicious, no? See the parallel-estimates.log.

Thanks for looking at this! Comparing the parallel plan vs. the
non-parallel plan:

part: parallel rows (after Gather) 20202, non-parallel rows 20202
partsupp: parallel rows 18, non-parallel rows 18
part-partsupp join: parallel rows 88988, non-parallel rows 355951
lineitem: parallel rows 59986112, non-parallel rows 59986112
lineitem after aggregation: parallel rows 5998611, non-parallel rows 5998611
final join: parallel rows 131, non-parallel rows 524

I agree with you that that looks mighty suspicious. Both the
part-partsupp join and the final join have exactly 4x as many
estimated rows in the non-parallel plan as in the parallel plan, and
it just so happens that the parallel divisor here will be 4.

Hmm... it looks like the parallel_workers value from the Gather node
is being erroneously propagated up to the higher levels of the plan
tree. Wow. Somehow, Gather Merge managed to get the logic correct
here, but Gather is totally wrong. Argh. Attached is a draft patch,
which I haven't really tested beyond checking that it passes 'make
check'.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
fix-gather-worker-propagation.patch application/octet-stream 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ideriha, Takeshi 2017-03-30 02:57:50 Re: [WIP] RE: DECLARE STATEMENT setting up a connection in ECPG
Previous Message Vitaly Burovoy 2017-03-30 02:18:03 Re: sequence data type