Re: Partition-wise join for join between (declaratively) partitioned tables

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-07-26 09:28:20
Message-ID: CAOGQiiONzaScLtZZH_BsDEg+-H9=V=WXYkQwo+rGk1BO1JJNCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 26, 2017 at 10:38 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>
> On Tue, Jul 25, 2017 at 9:39 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > On Tue, Jul 25, 2017 at 8:59 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> On Tue, Jul 25, 2017 at 1:31 AM, Rafia Sabih
> >> <rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
> >>> - other queries show a good 20-30% improvement in performance. Performance
> >>> numbers are as follows,
> >>>
> >>> Query| un_part_head (seconds) | part_head (seconds) | part_patch (seconds) |
> >>> 3 | 76 |127 | 88 |
> >>> 4 |17 | 244 | 41 |
> >>> 5 | 52 | 123 | 84 |
> >>> 7 | 73 | 134 | 103 |
> >>> 10 | 67 | 111 | 89 |
> >>> 12 | 53 | 114 | 99 |
> >>> 18 | 447 | 709 | 551 |
> >>
> >> Hmm. This certainly shows that benefit of the patch, although it's
> >> rather sad that we're still slower than if we hadn't partitioned the
> >> data in the first place. Why does partitioning hurt performance so
> >> much?
> >
> > I was analysing some of the plans (without partition and with
> > partition), Seems like one of the reasons of performing worse with the
> > partitioned table is that we can not use an index on the partitioned
> > table.
> >
> > Q4 is taking 17s without partition whereas it's taking 244s with partition.
> >
> > Now if we analyze the plan
> >
> > Without partition, it can use parameterize index scan on lineitem
> > table which is really big in size. But with partition, it has to scan
> > this table completely.
> >
> > -> Nested Loop Semi Join
> > -> Parallel Bitmap Heap Scan on orders
> > -> Bitmap Index Scan on
> > idx_orders_orderdate (cost=0.00..24378.88 r
> > -> Index Scan using idx_lineitem_orderkey on
> > lineitem (cost=0.57..29.29 rows=105 width=8) (actual
> > time=0.031..0.031 rows=1 loops=1122364)
> > Index Cond: (l_orderkey =
> > orders.o_orderkey)
> > Filter: (l_commitdate < l_receiptdate)
> > Rows Removed by Filter: 1
> >
>
> If the partitions have the same indexes as the unpartitioned table,
> planner manages to create parameterized plans for each partition and
> thus parameterized plan for the whole partitioned table. Do we have
> same indexes on unpartitioned table and each of the partitions? The

Yes both lineitem and orders have same number of partitions viz 17 and
on the same partitioning key (*_orderkey) and same ranges for each
partition. However, I missed creating the index on o_orderdate for the
partitions. But on creating it as well, the plan with bitmap heap scan
is used and it still completes in some 200 seconds, check the attached
file for the query plan.

> difference between the two cases is the parameterized path on an
> unpartitioned table scans only one index whereas that on the
> partitioned table scans the indexes on all the partitions. My guess is
> the planner thinks those many scans are costlier than hash/merge join
> and chooses those strategies over parameterized nest loop join. In
> case of partition-wise join, only one index on the inner partition is
> involved and thus partition-wise join picks up parameterized nest loop
> join. Notice, that this index is much smaller than the index on the
> partitioned table, so the index scan will be a bit faster. But only a
> bit, since the depth of the index doesn't increase linearly with the
> size of index.
>
As I have observed, the thing with this query is that selectivity
estimation is too high than actual, now when index scan is chosen for
lineitem being in the inner side of NLJ, the query completes quickly
since the number of actual returned rows is too low. However, in case
we pick seq scan, or lineitem is on the outer side, the query is going
to take a really long time. Now, when Hash-Join is picked in the case
of partitioned database and no partition-wise join is available, seq
scan is preferred instead of index scan and hence the elongated query
execution time.

I tried this query with random_page_cost = 0 and forcing NLJ and the
chosen plan completes the query in 45 seconds, check the attached file
for explain analyse output.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachment Content-Type Size
q4_idx_orderdate.out application/octet-stream 13.2 KB
Q4_low_random_page_cost.out application/octet-stream 15.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rafia Sabih 2017-07-26 09:34:46 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Masahiko Sawada 2017-07-26 08:38:32 Re: Block level parallel vacuum WIP