Re: Partition-wise join for join between (declaratively) partitioned tables

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-07-25 05:31:04
Message-ID: CAOGQiiPG2amSrd=RT3v2ZEN9fCGuEX8tpqWKoj2zJ8q-775k+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 21, 2017 at 12:11 PM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Fri, Jul 21, 2017 at 11:54 AM, Rafia Sabih
> <rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
>> So, does this
>> also mean that a partitioned table will not join with an unpartitioned
>> table without append of partitions?
>>
>
> Yes. When you join an unpartitioned table with a partitioned table,
> the planner will choose to append all the partitions of the
> partitioned table and then join with the unpartitioned table.
>

I tested this set of patches for TPC-H benchmark and came across following
results,
- total 7 queries were using partition-wise join,

- Q4 attains a speedup of around 80% compared to the partitioned setup
without partition-wise join, the main reason being the poor plan choice at
head for partitioned database.
When I tried this query with forced nested-loop join then it completes in
some 45 seconds at head. So, basically when no partition-wise join is
present because of terrible selectivity estimation optimiser picks up a
hash join plan, which results poorly as the estimated number of rows are
two orders of magnitude lesser than actual.
Note that this is not the effect of [1], I tried this without that patch as
well.

- other queries show a good 20-30% improvement in performance. Performance
numbers are as follows,

Query| un_part_head (seconds) | part_head (seconds) | part_patch (seconds) |
3 | 76 |127 | 88 |
4 |17 | 244 | 41 |
5 | 52 | 123 | 84 |
7 | 73 | 134 | 103 |
10 | 67 | 111 | 89 |
12 | 53 | 114 | 99 |
18 | 447 | 709 | 551 |

The experimental settings used were,

Partitioning: Range partitioning on lineitem and orders on l_orderkey and
o_orderkey respectively. The number and range of partitions were kept same
for both the tables.

Server parameters:
work_mem - 1GB
effective_cache_size - 8GB
shared_buffers - 8GB
enable_partition_wise_join = on

TPC-H setup:
scale-factor - 20

Commit id - 42171e2cd23c8307bbe0ec64e901f58e297db1c3, also, the patch at
[1] was applied in all the cases.
Query plans for the above mentioned queries is attached.

[1] https://www.postgresql.org/message-id/CAEepm%3D3%
3DNHHko3oOzpik%2BggLy17AO%2Bpx3rGYrg3x_x05%2BBr9-A%40mail.gmail.com
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachment Content-Type Size
part_perf.zip application/zip 207.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rushabh Lathia 2017-07-25 05:41:09 Re: cache lookup failed error for partition key with custom opclass
Previous Message Tom Lane 2017-07-25 01:33:52 Re: Testlib.pm vs msys