Quick Links

Re: Partition-wise join for join between (declaratively) partitioned tables

From:	Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partition-wise join for join between (declaratively) partitioned tables
Date:	2017-09-18 04:48:24
Message-ID:	CAOGQiiN9m=KRf-et1T0AcimbyAB9hDzJqGkHnOBjWT4uF1z1BQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Sep 15, 2017 at 5:29 PM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Fri, Sep 15, 2017 at 2:09 PM, Rafia Sabih
> <rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
>> On TPC-H benchmarking of this patch, I found a regression in Q7. It
>> was taking some 1500s with the patch and some 900s without the patch.
>> Please find the attached pwd_reg.zip for the output of explain analyse
>> on head and with patch.
>>
>> The experimental settings used were,
>> commit-id = 0c504a80cf2e6f66df2cdea563e879bf4abd1629
>> patch-version = v26
>>
>> Server settings:
>> work_mem = 1GB
>> shared_buffers = 10GB
>> effective_cache_size = 10GB
>> max_parallel_workers_per_gather = 4
>>
>> Partitioning information:
>> Partitioning scheme = by range
>> Number of partitions in lineitem and orders table = 106
>> partition key for lineitem = l_orderkey
>> partition key for orders = o_orderkey
>
> I observe that with partition-wise join patch the planner is using
> GatherMerge along-with partition-wise join and on head its not using
> GatherMerge. Just to make sure that its partition-wise join which is
> causing regression and not GatherMerge, can you please run the query
> with enable_gathermerge = false?
>
That does not sound plausible since around 130s are already spent till
append node. Anyhow, I executed the query with enable_gathermerge =
false, and still it is taking some 1500 secs. Please find the attached
file for the explain analyse output.

> I see following lines explain analyze output 7_1.out without the patch
> -> Sort (cost=84634030.40..84638520.55 rows=1796063
> width=72) (actual time=1061001.435..1061106.608 rows=437209 loops=1)
> Sort Key: n1.n_name, n2.n_name,
> (date_part('year'::text, (lineitem_001.l_shipdate)::timestamp without
> time zone))
> Sort Method: quicksort Memory: 308912kB
> -> Hash Join (cost=16080591.94..84447451.72
> rows=1796063 width=72) (actual time=252745.701..1057447.219
> rows=1749956 loops=1)
> Since Sort doesn't filter any rows, we would expect it to output the
> same number of rows as hash join underneath it. But the number of rows
> differ in this case. I am wondering whether there's some problem with
> the explain analyze output itself.
>

Limit (cost=83341943.28..83341943.35 rows=1 width=92) (actual
time=1556989.996..1556989.997 rows=1 loops=1)
-> Finalize GroupAggregate (cost=83341943.28..83342723.24
rows=10064 width=92) (actual time=1556989.994..1556989.994 rows=1
loops=1)
Group Key: n1.n_name, n2.n_name, (date_part('year'::text,
(lineitem_001.l_shipdate)::timestamp without time zone))
-> Sort (cost=83341943.28..83342043.92 rows=40256 width=92)
(actual time=1556989.910..1556989.911 rows=6 loops=1)
Sort Key: n1.n_name, n2.n_name,
(date_part('year'::text, (lineitem_001.l_shipdate)::timestamp without
time zone))
Sort Method: quicksort Memory: 27kB
-> Gather (cost=83326804.81..83338864.31 rows=40256
width=92) (actual time=1550598.855..1556989.760 rows=20 loops=1)
Workers Planned: 4
Workers Launched: 4

AFAICU the node above sort is group-aggregate and then there is limit,
and the number of rows for sort node in explain analyse is returned
number of rows. So, what is happening here is once one group is
completed it is aggregated and fetched by limit, now there is no need
for sort to return any more rows and hence the result.
>>
>> Apart from these there is a regression case on a custom table, on head
>> query completes in 20s and with this patch it takes 27s. Please find
>> the attached .out and .sql file for the output and schema for the test
>> case respectively. I have reported this case before (sometime around
>> March this year) as well, but I am not sure if it was overlooked or is
>> an unimportant and expected behaviour for some reason.
>>
>
> Are you talking about [1]? I have explained about the regression in
> [2] and [3]. This looks like an issue with the existing costing model.
>
> [1] https://www.postgresql.org/message-id/CAOGQiiMwcjNrunJ_fCDBscrTLeJ-CLp7exfzzipe2ut71n4LUA@mail.gmail.com
> [2] https://www.postgresql.org/message-id/CAFjFpRedUZPa7tKbCLEGK3u5UWdDNQoN=eYfb7ieG5d0D1PbsQ@mail.gmail.com
> [3] https://www.postgresql.org/message-id/CAFjFpReJKSdCfaeuZjGD79hOETzpz5BKDxLJgxr7qznrXX+TRw@mail.gmail.com
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachment	Content-Type	Size
7_gm_false.out	application/octet-stream	129.3 KB

In response to

Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-09-15 11:59:21 from Ashutosh Bapat

Responses

Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-09-21 13:07:25 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2017-09-18 04:50:28	pg_control_recovery() return value when not in recovery
Previous Message	Amit Kapila	2017-09-18 04:26:49	parallel.c oblivion of worker-startup failures