Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2019-07-31 05:47:41
Message-ID: CA+HiwqHAAi0Ni56k0f_Sv8gAPdjRwEWYeHPQeb_+ah8fCzFNZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 30, 2019 at 6:00 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> On Fri, Jul 19, 2019 at 10:44 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Thu, Jul 18, 2019 at 2:55 AM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> > > I.e., partition_bounds_merge() is performed for each pair of input
> > > partitioned relations for a join relation in try_partitionwise_join().
> > > Since partition_bounds_merge() would need a lot of CPU cycles, I don't
> > > think this is acceptable; ISTM that some redesign is needed to avoid
> > > this. I'm wondering that once we successfully merged partition bounds
> > > from a pair of input partitioned relations for the join relation, by
> > > using the merged partition bounds, we could get the lists of matching
> > > to-be-joined partitions for subsequent pairs of input partitioned
> > > relations for the join relation in a more efficient way than by
> > > performing partition_bounds_merge() as proposed in the patch.
> >
> > I don't know whether partition_bounds_merge() is well-implemented; I
> > haven't looked.
>
> My concern about that is list partitioning. In that case that
> function calls partition_list_bounds_merge(), which generates the
> partition bounds for a join relation between two given input
> relations, by performing merge join for a pair of the datums arrays
> from both the input relations.

I had similar thoughts upon seeing that partition_bounds_merge() will
be replacing the current way of determining if partition-wise join can
occur; that it will make the handling of currently supported cases
more expensive.

The current way is to compare the PartitionBoundInfos of joining
relations using partition_bounds_equal(), and if equal, simply join
the pairs of matching partitions if the join quals permit doing so.
There's no need to do anything extra to determine which partitions to
join with each other, because it's already established. Likewise,
partition_bounds_merge() shouldn't to have to anything extra in that
case. That is, for the cases that are already supported, we should
find a way to make partition_bounds_merge() only as expensive as just
performing partition_bounds_equals(), or maybe just slightly more.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2019-07-31 05:49:08 Unused header file inclusion
Previous Message vignesh C 2019-07-31 05:38:11 Re: Unused struct member in pgcrypto pgp.c