Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2018-07-27 07:17:04
Message-ID: CAFjFpRdEawGntehu9oKNFZvm2JAS5wJ_qT+5vrvwWt=h8GKUbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 26, 2018 at 8:37 PM, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
>> On Mon, 23 Jul 2018 at 10:38, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>
>> On Fri, Jul 20, 2018 at 3:13 AM, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
>> >
>> > It's of course wrong, it's going to be O(max(m, n)) as you said, but
>> > the point is still valid - if we have partitions A1, A2 from one side
>> > and B1, ..., BN on another side, we can skip necessary the
>> > partitions from B that are between e.g. A1 and A2 faster with
>> > binary search.
>>
>> That's possible only when there is no default partition and the join
>> is an inner join. For an outer join, we need to include all the
>> partitions on the outer side, so we can't just skip over some
>> partitions. In case of a default partition, it can take place of a
>> missing partition, so we can't skip partitions using binary search.
>
> But in this case I described above (when we have a partition A3 on one side,
> and another matching partition B3 from other side, but separated by some other
> partitions, e.g. B1, B2) it means that we will merge A3 with a default
> partition from other side without actually needing that, am I right? In this
> sense it's an overhead out of nothing.

No. We will join A3 with B3 since they have matching bounds. We will
compare B1 and B2's bounds with A3 (assuming that there are no bounds
before A3). Since they won't be compatible we will match default of A
to B1 and B2. That will of-course fail since we will try to match
multiple partitions to a single partition, but that's not the point of
your question I think. You are right that we could skip comparing A3
with B1 and B2 and directly land on B3. Any partitions skipped in
between will be matched with A's default partition. But as I have said
this would be rare and complicating the logic for a rare case doesn't
seem practical at this stage.

Apart from the complexity there's also a possibility that this
skipping will reduce the efficiency actually in normal cases. Consider
a case where A and B have exactly matching partitions. Current
partition matching algorithm compare any given range/bound only once
and we will have O(n) algorithm. If we use a binary search, however,
for every range comparison, it will be O(n log n) algorithm. There
will be unnecessary comparisons during binary search. The current
algorithm is always O(n), whereas binary search would be O(n log(n))
with a possibility of having sub-O(n) complexity in some cases. I
would go for an algorithm which has a consistent time complexity
always and which is efficient in normal cases, rather than worrying
about some cases which are not practical.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2018-07-27 08:10:58 Re: Early WIP/PoC for inlining CTEs
Previous Message Amit Langote 2018-07-27 07:11:11 Re: Speeding up INSERTs and UPDATEs to partitioned tables