Re: Reducing duplicativeness of EquivalenceClass-derived clauses

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Reducing duplicativeness of EquivalenceClass-derived clauses
Date: 2022-10-26 13:54:17
Message-ID: 217731.1666792457@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> On Wed, Oct 26, 2022 at 6:09 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The only thing that I think might be controversial here is that
>> I dropped the check for matching operator OID. To preserve that,
>> we'd have needed to use get_commutator() in the reverse-match cases,
>> which it seemed to me would be a completely unjustified expenditure
>> of cycles. The operators we select for freshly-generated clauses
>> will certainly always match those of previously-generated clauses.
>> Maybe there's a chance that they'd not match those of ec_sources
>> clauses (that is, the user-written clauses we started from), but
>> if they don't and that actually makes any difference then surely
>> we are talking about a buggy opclass definition.

> The operator is chosen according to the two given EC members's data
> type. Since we are dealing with the same pair of EC members, I think
> the operator is always the same one. So it also seems no problem to drop
> the check for operator. I wonder if we can even add an assertion if
> we've found a RestrictInfo from ec_derives that the operator matches.

Yeah, I considered that --- even if somehow an ec_sources entry isn't
an exact match, ec_derives ought to be. However, it still didn't seem
worth a get_commutator() call. We'd basically be expending cycles to
check that select_equality_operator yields the same result with the same
inputs as it did before, and that doesn't seem terribly interesting to
check. I'm also not sure what's the point of allowing divergence
from the requested operator in some but not all paths.

I added a bit of instrumentation to count how many times we need to build
new join clauses in create_join_clause. In the current core regression
tests, I see this change reducing the number of new join clauses built
here from 9673 to 5142 (out of 26652 calls). So not quite 50% savings,
but pretty close to it. That should mean that this change is about
a wash just in terms of the code it touches directly: each iteration
of the search loops is nearly twice as expensive as before, but we'll
only need to do about half as many. So whatever we save downstream
is pure gravy.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Egor Chindyaskin 2022-10-26 14:47:08 Re: Stack overflow issue
Previous Message Amit Kapila 2022-10-26 11:19:08 Re: Perform streaming logical transactions by background workers and parallel apply