Re: A problem about partitionwise join

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, ashutosh(dot)bapat(at)enterprisedb(dot)com
Subject: Re: A problem about partitionwise join
Date: 2024-05-03 11:47:45
Message-ID: CAMbWs49fL1ZscsR-6=8bFAfLjt9=TMmpMv1ZHTUrHASAgxU7qQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 1, 2024 at 1:31 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I think it's slightly questionable whether this patch is worthwhile.
> The case memorialized in the regression tests, t1.a = t2.a AND t1.a =
> t2.b, is a very weird thing to do. The case mentioned in the original
> email, foo.k1 = bar.k1 and foo.k2 = bar.k2 and foo.k2 = 16, seems like
> something that could realistically happen, especially when there are
> views in use (e.g. the view joins foo and bar, and then someone
> queries the view for one of the join columns). In such a case, it's
> possible that foo.k2 = 16 is selective enough that we really don't
> care about partition-wise join any more, but it's also possible that
> it's not too selective and we do care about partition-wise join. So I
> don't think that the case that the patch fixes is something that can
> ever happen, but I do think it's probably fairly rare that brings any
> benefit, which is why I thought that EC-based matching was an OK
> approach to this problem initially. Perhaps that was the wrong idea,
> though.

Thank you for taking the time to review this patch! I think Ashutosh
also mentioned that the new added test case looks artificial. I must
admit that I'm not too sure how common we encounter queries with
partially-redundant join clauses in real-life scenarios. It is possible
that such cases are quite rare, and this patch will then not be of much
use.

I initially brought up this issue because I noticed an inconsistency
regarding the generation of a partition-wise join: with 't1.k = t2.k and
t1.k = t2.val' we are able to generate a partition-wise join, while its
equivalent form 't1.k = t2.val and t1.k = t2.k' does not result in a
partition-wise join. I think this inconsistency could be confusing.

The reason behind this is that with 't1.k = t2.val and t1.k = t2.k' it
happens to constrain other members (t1.k and t2.val) of the EC than the
ones we are looking for (t1.k and t2.k). Our current code looks through
the join's restriction clauses for matched keys. In addition to that,
this patch checks to see if any unmatched keys are known equal by ECs,
leveraging function exprs_known_equal().

> Does the additional logic added by this patch have a noticeable
> performance cost?

I think one concern regarding performance cost is that the function
exprs_known_equal() would be called O(N^2) times, where N is the number
of partition key expressions. But I think this might not be a problem.
The number of a joinrel's partition key expressions would only be equal
to the join degree, since each base relation within the join contributes
only one partition key expression, according to
set_joinrel_partition_key_exprs(). This number would not scale with the
number of partitions. But I have not measured the performance in
practice by running benchmarks. Maybe I'm just wrong.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-05-03 11:48:20 Re: wrong comment in libpq.h
Previous Message Ashutosh Bapat 2024-05-03 09:32:52 Re: tablecmds.c/MergeAttributes() cleanup