From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Reducing duplicativeness of EquivalenceClass-derived clauses |
Date: | 2022-10-26 11:04:56 |
Message-ID: | CAMbWs4-6-ywf8AoQJ9z48q+v1eL5Scsw-d_nsr9s+dm-Ge4VGA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 26, 2022 at 6:09 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> While fooling with my longstanding outer-join variables changes
> (I am making progress on that, honest), I happened to notice that
> equivclass.c is leaving some money on the table by generating
> redundant RestrictInfo clauses. It already attempts to not generate
> the same clause twice, which can save a nontrivial amount of work
> because we cache selectivity estimates and so on per-RestrictInfo.
> I realized though that it will build and return a clause like
> "a.x = b.y" even if it already has "b.y = a.x". This is just
> wasteful. It's always been the case that equivclass.c will
> produce clauses that are ordered according to its own whims.
> Consumers that need the operands in a specific order, such as
> index scans or hash joins, are required to commute the clause
> to be the way they want it while building the finished plan.
> Therefore, it shouldn't matter which order of the operands we
> return, and giving back the commutator clause if available could
> potentially save as much as half of the selectivity-estimation
> work we do with these clauses subsequently.
>
> Hence, PFA a patch that adjusts create_join_clause() to notice
> commuted as well as exact matches among the EquivalenceClass's
> existing clauses. This results in a number of changes visible in
> regression test cases, but they're all clearly inconsequential.
I think there is no problem with this idea, given the operands of
EC-derived clauses are commutative, and it seems no one would actually
rely on the order of the operands. I can see hashjoin/mergejoin would
commute hash/merge joinclauses if needed with get_switched_clauses().
>
> The only thing that I think might be controversial here is that
> I dropped the check for matching operator OID. To preserve that,
> we'd have needed to use get_commutator() in the reverse-match cases,
> which it seemed to me would be a completely unjustified expenditure
> of cycles. The operators we select for freshly-generated clauses
> will certainly always match those of previously-generated clauses.
> Maybe there's a chance that they'd not match those of ec_sources
> clauses (that is, the user-written clauses we started from), but
> if they don't and that actually makes any difference then surely
> we are talking about a buggy opclass definition.
The operator is chosen according to the two given EC members's data
type. Since we are dealing with the same pair of EC members, I think
the operator is always the same one. So it also seems no problem to drop
the check for operator. I wonder if we can even add an assertion if
we've found a RestrictInfo from ec_derives that the operator matches.
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2022-10-26 11:06:43 | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Previous Message | David Rowley | 2022-10-26 10:35:37 | Re: Have nodeSort.c use datum sorts single-value byref types |