Re: BUG #15430: partition-wise join only works in combination with pruning on 1 partition

From: Bart D <agile(dot)data(dot)analytics(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #15430: partition-wise join only works in combination with pruning on 1 partition
Date: 2018-10-15 19:30:22
Message-ID: 20BCBC3B-6F69-46C4-8EB1-FCF77BBF9FF1@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Amit,

You are right, I think the feature should be enabled by default though.

This feature guarantees linear scalability.

My apologies, please close the bug.

Please find results below.

Thanks in advance,

psql <<-EOF
show enable_partitionwise_join;
set enable_partitionwise_join = on;
show enable_partitionwise_join;
explain
select *
from
part_test.measurement m1
inner join part_test.measurement2 m2
on m1.city_id = m2.city_id and m1.logdate = m2.logdate
EOF
# Append (cost=257.79..573.65 rows=172 width=32)
# -> Merge Join (cost=257.79..286.40 rows=86 width=32)
# Merge Cond: ((m1.city_id = m2.city_id) AND (m1.logdate = m2.logdate))
# -> Sort (cost=128.89..133.52 rows=1850 width=16)
# Sort Key: m1.city_id, m1.logdate
# -> Seq Scan on measurement_y2006m02 m1 (cost=0.00..28.50 rows=1850 width=16)
# -> Sort (cost=128.89..133.52 rows=1850 width=16)
# Sort Key: m2.city_id, m2.logdate
# -> Seq Scan on measurement2_y2006m02 m2 (cost=0.00..28.50 rows=1850 width=16)
# -> Merge Join (cost=257.79..286.40 rows=86 width=32)
# Merge Cond: ((m1_1.city_id = m2_1.city_id) AND (m1_1.logdate = m2_1.logdate))
# -> Sort (cost=128.89..133.52 rows=1850 width=16)
# Sort Key: m1_1.city_id, m1_1.logdate
# -> Seq Scan on measurement_y2006m03 m1_1 (cost=0.00..28.50 rows=1850 width=16)
# -> Sort (cost=128.89..133.52 rows=1850 width=16)
# Sort Key: m2_1.city_id, m2_1.logdate
# -> Seq Scan on measurement2_y2006m03 m2_1 (cost=0.00..28.50 rows=1850 width=16)

psql <<-EOF
show enable_partitionwise_join;
set enable_partitionwise_join = on;
show enable_partitionwise_join;
explain
select *
from
part_test.measurement m1
inner join part_test.measurement2 m2
on m1.city_id = m2.city_id and m1.logdate = m2.logdate
where m1.logdate in ('2006-02-01'::date, '2006-03-01'::date)
EOF
# Append (cost=33.40..160.82 rows=2 width=32)
# -> Hash Join (cost=33.40..80.41 rows=1 width=32)
# Hash Cond: ((m2.city_id = m1.city_id) AND (m2.logdate = m1.logdate))
# -> Seq Scan on measurement2_y2006m02 m2 (cost=0.00..28.50 rows=1850 width=16)
# -> Hash (cost=33.12..33.12 rows=18 width=16)
# -> Seq Scan on measurement_y2006m02 m1 (cost=0.00..33.12 rows=18 width=16)
# Filter: (logdate = ANY ('{2006-02-01,2006-03-01}'::date[]))
# -> Hash Join (cost=33.40..80.41 rows=1 width=32)
# Hash Cond: ((m2_1.city_id = m1_1.city_id) AND (m2_1.logdate = m1_1.logdate))
# -> Seq Scan on measurement2_y2006m03 m2_1 (cost=0.00..28.50 rows=1850 width=16)
# -> Hash (cost=33.12..33.12 rows=18 width=16)
# -> Seq Scan on measurement_y2006m03 m1_1 (cost=0.00..33.12 rows=18 width=16)
# Filter: (logdate = ANY ('{2006-02-01,2006-03-01}'::date[]))

> On 15 Oct 2018, at 06:18, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hi,
>
> On 2018/10/15 0:20, PG Bug reporting form wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference: 15430
>> Logged by: Bart Debersaques
>> Email address: agile(dot)data(dot)analytics(at)gmail(dot)com
>> PostgreSQL version: 11rc1
>> Operating system: Centos 7
>> Description:
>
> [ ... ]
>
>> psql <<-EOF
>> explain
>> select *
>> from
>> part_test.measurement m1
>> inner join part_test.measurement2 m2
>> on m1.city_id = m2.city_id and m1.logdate = m2.logdate
>> where m1.logdate in ('2006-02-01'::date, '2006-03-01'::date)
> EOF
>> # Hash Join (cost=66.97..170.25 rows=3 width=32)
>> # Hash Cond: ((m2.city_id = m1.city_id) AND (m2.logdate = m1.logdate))
>> # -> Append (cost=0.00..75.50 rows=3700 width=16)
>> # -> Seq Scan on measurement2_y2006m02 m2 (cost=0.00..28.50
>> rows=1850 width=16)
>> # -> Seq Scan on measurement2_y2006m03 m2_1 (cost=0.00..28.50
>> rows=1850 width=16)
>> # -> Hash (cost=66.43..66.43 rows=36 width=16)
>> # -> Append (cost=0.00..66.43 rows=36 width=16)
>> # -> Seq Scan on measurement_y2006m02 m1 (cost=0.00..33.12
>> rows=18 width=16)
>> # Filter: (logdate = ANY
>> ('{2006-02-01,2006-03-01}'::date[]))
>> # -> Seq Scan on measurement_y2006m03 m1_1 (cost=0.00..33.12
>> rows=18 width=16)
>> # Filter: (logdate = ANY
>> ('{2006-02-01,2006-03-01}'::date[]))
>>
>> # conclusion: join performed on full tab scans, join not performed on a
>> per-partion basis
>
> Have you changed the value of enable_partitionwise_join parameter to 'on'?
> It's 'off' by default, so join is not performed on per-partition basis.
>
> Thanks,
> Amit
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2018-10-16 00:29:29 Re: BUG #15430: partition-wise join only works in combination with pruning on 1 partition
Previous Message Tom Lane 2018-10-15 18:48:24 Re: Index-only scan returns incorrect results when using a composite GIST index with a gist_trgm_ops column.