Quick Links

Re: Reducing duplicativeness of EquivalenceClass-derived clauses

From:	Zhang Mingli <zmlpostgres(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Reducing duplicativeness of EquivalenceClass-derived clauses
Date:	2022-10-27 13:21:17
Message-ID:	2faecd55-6786-471c-84ac-36aa7149abee@Spark
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

HI,

On Oct 26, 2022, 06:09 +0800, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, wrote:
> While fooling with my longstanding outer-join variables changes
> (I am making progress on that, honest), I happened to notice that
> equivclass.c is leaving some money on the table by generating
> redundant RestrictInfo clauses. It already attempts to not generate
> the same clause twice, which can save a nontrivial amount of work
> because we cache selectivity estimates and so on per-RestrictInfo.
> I realized though that it will build and return a clause like
> "a.x = b.y" even if it already has "b.y = a.x". This is just
> wasteful. It's always been the case that equivclass.c will
> produce clauses that are ordered according to its own whims.
> Consumers that need the operands in a specific order, such as
> index scans or hash joins, are required to commute the clause
> to be the way they want it while building the finished plan.
> Therefore, it shouldn't matter which order of the operands we
> return, and giving back the commutator clause if available could
> potentially save as much as half of the selectivity-estimation
> work we do with these clauses subsequently.
>
> Hence, PFA a patch that adjusts create_join_clause() to notice
> commuted as well as exact matches among the EquivalenceClass's
> existing clauses. This results in a number of changes visible in
> regression test cases, but they're all clearly inconsequential.
>
> The only thing that I think might be controversial here is that
> I dropped the check for matching operator OID. To preserve that,
> we'd have needed to use get_commutator() in the reverse-match cases,
> which it seemed to me would be a completely unjustified expenditure
> of cycles. The operators we select for freshly-generated clauses
> will certainly always match those of previously-generated clauses.
> Maybe there's a chance that they'd not match those of ec_sources
> clauses (that is, the user-written clauses we started from), but
> if they don't and that actually makes any difference then surely
> we are talking about a buggy opclass definition.
>
> I've not bothered to make any performance tests to see if there's
> actually an easily measurable gain here. Saving some duplicative
> selectivity estimates could be down in the noise ... but it's
> surely worth the tiny number of extra tests added here.
>
> Comments?
>
> regards, tom lane
>
Make sense.

How about combine ec->ec_sources and ec->derives as one list for less codes?

```
foreach(lc, list_union(ec->ec_sources, ec->ec_derives))
{
rinfo = (RestrictInfo *) lfirst(lc);
if (rinfo->left_em == leftem &&
rinfo->right_em == rightem &&
rinfo->parent_ec == parent_ec)
return rinfo;
if (rinfo->left_em == rightem &&
rinfo->right_em == leftem &&
rinfo->parent_ec == parent_ec)
return rinfo;
}
```
I have a try, it will change some in join.out and avoid changes in tidscan.out.

Regards,
Zhang Mingli
>

In response to

Reducing duplicativeness of EquivalenceClass-derived clauses at 2022-10-25 22:09:06 from Tom Lane

Responses

Re: Reducing duplicativeness of EquivalenceClass-derived clauses at 2022-10-27 13:29:02 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2022-10-27 13:24:46	Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql
Previous Message	Matthias van de Meent	2022-10-27 13:17:00	Re: [PATCHES] Post-special page storage TDE support