Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-07-20 06:17:57
Message-ID: CAFjFpRea5mqMuY9eqt5xGXFaqGk1Lk2z30PgsUnp88fHOJ0ozw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 20, 2017 at 12:30 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> This suggests that partitioning is not a suitable strategy for this query,
>> but then may be partition wise should not be picked for such a case to
>> aggravate the performance issue.
>
> In the unpartitioned case, and in the partitioned case on head, the
> join order is l1-(nation-supplier)-l2-orders-l3. In the patched case,
> the join order changes to l1-l2-supplier-orders-nation-l3. If the
> planner used the former join order, it wouldn't be able to do a
> partition-wise join at all, so it must think that the l1-l2 join gets
> much cheaper when done partitionwise, thus justifying a change in the
> overall join order to be able to use partion-wise join. But it
> doesn't work out.
>
> I think the problem is that the row count estimates for the child
> joins seem to be totally bogus:
>
> -> Hash Semi Join (cost=309300.53..491665.60 rows=1 width=12)
> (actual time=10484.422..15945.851 rows=1523493 loops=3)
> Hash Cond: (l1.l_orderkey = l2.l_orderkey)
> Join Filter: (l2.l_suppkey <> l1.l_suppkey)
> Rows Removed by Join Filter: 395116
>
> That's clearly wrong. In the un-partitioned plan, the join to l2
> produces about as many rows of output as the number of rows that were
> input (998433 vs. 962909); but here, a child join with a million rows
> as input is estimated to produce only 1 row of output. I bet the
> problem is that the child-join's row count estimate isn't getting
> initialized at all, but then something is clamping it to 1 row instead
> of 0.
>
> So this looks like a bug in Ashutosh's patch.

The patch does not have any changes to the selectivity estimation. It
might happen that some correction in selectivity estimation for
child-joins is required, but I have not spotted any code in
selectivity estimation that differentiates explicitly between child
and parent Vars and estimates. So, I am more inclined to believe
Thomas's theory. I will try Tom's suggested approach.

I am investigating this case with the setup that Rafia provided.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-07-20 06:25:33 Re: PgFDW connection invalidation by ALTER SERVER/ALTER USER MAPPING
Previous Message Michael Paquier 2017-07-20 06:17:38 Re: Error while copying a large file in pg_rewind