Re: [Proposal] Table partition + join pushdown

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Hiroshi Yanagisawa <hir-yanagisawa(at)ut(dot)jp(dot)nec(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: [Proposal] Table partition + join pushdown
Date: 2016-01-26 02:09:58
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80119F667@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Tue, Jan 19, 2016 at 7:59 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> > On Mon, Jan 18, 2016 at 5:55 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> For
> >> example, suppose that x and y are numeric columns and P(x) is
> >> length(x::text) == 3. Then you could have 1 in one table and 1.0 in
> >> the table; they join, but P(x) is true for one and false for the
> >> other.
> >
> > Fwiw, ages ago there was some talk about having a property on
> > functions "equality preserving" or something like that. If a function,
> > or more likely a <function,operator> tuple had this property set then
> > x op y => f(x) op f(y). This would be most useful for things like
> > substring or hash functions which would allow partial indexes or
> > partition exclusion to be more generally useful.
> >
> > Of course then you really want <f,op1,op2> to indicate that "a op1 b
> > => f(a) op2 f(b)" so you can handle things like <substring,lt,lte > so
> > that "a < b => substring(a,n) <= substring(b,n)" and you need some way
> > to represent the extra arguments to substring and the whole thing
> > became too complex and got dropped.
> >
> > But perhaps even a simpler property that only worked for equality and
> > single-argument functions would be useful since it would let us mark
> > hash functions Or perhaps we only need to mark the few functions that
> > expose properties that don't affect equality since I think there are
> > actually very few of them.
>
> We could certainly mark operators that amount to testing binary
> equality as such, and this optimization could be used for join
> operators so marked. But I worry that would become a crutch, with
> people implementing optimizations that work for such operators and
> leaving numeric (for example) out in the cold. Of course, we could
> worry about such problems when and if they happen, and accept the idea
> of markings for now. However, I'm inclined to think that there's a
> better way to optimize the case Taiki Kondo and Kouhei Kagai are
> targeting.
>
It seems to me Greg's idea intends to reduce CPU cycles by replacement
of the operator in use. I never deny we can have valuable scenarios,
however, we intend this feature to reduce amount of inner hash-table
size, to fit GPU RAM for example.

> If we get declarative partitioning, an oft-requested feature that has
> been worked on by various people over the years and currently by Amit
> Langote, and specifically if we get hash partitioning, then we'll
> presumably use the hash function for the default operator class of the
> partitioning column's datatype to partition the table. Then, if we do
> a join against some other table and consider a hash join, we'll be
> using the same hash function on our side, and either the same operator
> or a compatible operator for some other datatype in the same opfamily
> on the other side. At that point, if we push down the join, we can
> add a filter on the inner side of the join that the hash value of the
> matching column has to map to the partition it's being joined against.
> And we don't get a recurrence of this problem in that case, because
> we're not dealing with an arbitrary predicate - we're dealing with a
> hash function whose equality semantics are defined to be compatible
> with the join operator.
>
> That approach works with any data type that has a default hash
> operator class, which covers pretty much everything anybody is likely
> to care about, including numeric.
>
Except for usage of CHECK constraint, the above description is almost
same as we are intending. Hash joinable operators are expected to check
equality of both side at least, thus, we can predicate which inner
columns shall have identical value once both tuples are joined.
Then, we can filter out rows obviously obviously unmatched rows.

> At that point, if we push down the join, we can
> add a filter on the inner side of the join that the hash value of the
> matching column has to map to the partition it's being joined against.
>
Of course, its implementation is not graceful enough, especially, above
point because this extra filter will change expected number of rows to
be produced by inner relation, and relevant cost.
Right now, his patch calls cost_seqscan() and others according to the
type of inner relation by itself. Of course, it is not a portable way,
if inner relation would not be a simple relations scan.

Due to path construction staging, AppendPath with underlying join paths
has to be constructed on join path investigation steps. So, what is the
reasonable way to make inner relation's path node with filters pushed-
down?
It is the most ugly part of the current patch.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vinayak Pokale 2016-01-26 02:22:44 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message Alvaro Herrera 2016-01-26 01:22:31 Re: pgbench stats per script & other stuff