Re: Asymmetric partition-wise JOIN

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Kohei KaiGai <kaigai(at)heterodb(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asymmetric partition-wise JOIN
Date: 2020-08-26 13:32:55
Message-ID: CAAJ_b96=OF43+7k=ow5pFjSDLhGJkJsgNrbz7qchHiO7ont6vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 24, 2019 at 2:03 PM Kohei KaiGai <kaigai(at)heterodb(dot)com> wrote:
>
> 2019年8月24日(土) 7:02 Thomas Munro <thomas(dot)munro(at)gmail(dot)com>:
> >
> > On Fri, Aug 23, 2019 at 4:05 AM Kohei KaiGai <kaigai(at)heterodb(dot)com> wrote:
> > > We can consider the table join ptable X t1 above is equivalent to:
> > > (ptable_p0 + ptable_p1 + ptable_p2) X t1
> > > = (ptable_p0 X t1) + (ptable_p1 X t1) + (ptable_p2 X t1)
> > > It returns an equivalent result, however, rows are already reduced by HashJoin
> > > in the individual leaf of Append, so CPU-cycles consumed by Append node can
> > > be cheaper.
> > >
> > > On the other hands, it has a downside because t1 must be read 3 times and
> > > hash table also must be built 3 times. It increases the expected cost,
> > > so planner
> > > may not choose the asymmetric partition-wise join plan.
> >
> > What if you include the partition constraint as a filter on t1? So you get:
> >
> > ptable X t1 =
> > (ptable_p0 X (σ hash(dist)%4=0 (t1))) +
> > (ptable_p1 X (σ hash(dist)%4=1 (t1))) +
> > (ptable_p2 X (σ hash(dist)%4=2 (t1))) +
> > (ptable_p3 X (σ hash(dist)%4=3 (t1)))
> >
> > Pros:
> > 1. The hash tables will not contain unnecessary junk.
> > 2. You'll get the right answer if t1 is on the outer side of an outer join.
> > 3. If this runs underneath a Parallel Append and t1 is big enough
> > then workers will hopefully cooperate and do a synchronised scan of
> > t1.
> > 4. The filter might enable a selective and efficient plan like an index scan.
> >
> > Cons:
> > 1. The filter might not enable a selective and efficient plan, and
> > therefore cause extra work.
> >
> > (It's a little weird in this example because don't usually see hash
> > functions in WHERE clauses, but that could just as easily be dist
> > BETWEEN 1 AND 99 or any other partition constraint.)
> >
> It requires the join-key must include the partition key and also must be
> equality-join, doesn't it?
> If ptable and t1 are joined using ptable.dist = t1.foo, we can distribute
> t1 for each leaf table with "WHERE hash(foo)%4 = xxx" according to
> the partition bounds, indeed.
>
> In case when some of partition leafs are pruned, it is exactly beneficial
> because relevant rows to be referenced by the pruned child relations
> are waste of memory.
>
> On the other hands, it eventually consumes almost equivalent amount
> of memory to load the inner relations, if no leafs are pruned, and if we
> could extend the Hash-node to share the hash-table with sibling join-nodess.
>
> > > One idea I have is, sibling HashJoin shares a hash table that was built once
> > > by any of the sibling Hash plan. Right now, it is not implemented yet.
> >
> > Yeah, I've thought a little bit about that in the context of Parallel
> > Repartition. I'm interested in combining intra-node partitioning
> > (where a single plan node repartitions data among workers on the fly)
> > with inter-node partitioning (like PWJ, where partitions are handled
> > by different parts of the plan, considered at planning time); you
> > finish up needing to have nodes in the plan that 'receive' tuples for
> > each partition, to match up with the PWJ plan structure. That's not
> > entirely unlike CTE references, and not entirely unlike your idea of
> > somehow sharing the same hash table. I ran into a number of problems
> > while thinking about that, which I should write about in another
> > thread.
> >
> Hmm. Do you intend the inner-path may have different behavior according
> to the partition bounds definition where the outer-path to be joined?
> Let me investigate its pros & cons.
>
> The reasons why I think the idea of sharing the same hash table is reasonable
> in this scenario are:
> 1. We can easily extend the idea for parallel optimization. A hash table on DSM
> segment, once built, can be shared by all the siblings in all the
> parallel workers.
> 2. We can save the memory consumption regardless of the join-keys and
> partition-keys, even if these are not involved in the query.
>
> On the other hands, below are the downside. Potentially, combined use of
> your idea may help these cases:
> 3. Distributed inner-relation cannot be outer side of XXX OUTER JOIN.
> 4. Hash table contains rows to be referenced by only pruned partition leafs.
>

+ many, for the sharable hash of the inner table of the join. IMHO,
this could be the most interesting and captivating thing about this feature.
But might be a complicated piece, is that still on the plan?

Regards,
Amul

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2020-08-26 13:33:20 Re: factorial function/phase out postfix operators?
Previous Message Tom Lane 2020-08-26 13:32:34 Re: some unused parameters cleanup