Re: Implement missing join selectivity estimation for range types

From: Schoemans Maxime <maxime(dot)schoemans(at)ulb(dot)be>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SAKR Mahmoud <mahmoud(dot)sakr(at)ulb(dot)be>, Diogo Repas <diogo(dot)repas(at)gmail(dot)com>, LUO Zhicheng <zhicheng(dot)luo(at)ulb(dot)be>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Implement missing join selectivity estimation for range types
Date: 2023-11-20 20:17:22
Message-ID: f530d708-bcb4-1487-f25e-248ea101e571@ulb.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14/11/2023 20:46, Tom Lane wrote:
> I took a brief look through this very interesting work. I concur
> with Tomas that it feels a little odd that range join selectivity
> would become smarter than scalar inequality join selectivity, and
> that we really ought to prioritize applying these methods to that
> case. Still, that's a poor reason to not take the patch.

Indeed, we started with ranges as this was the simpler case (no MCV) and
was the topic of a course project.
The idea is to later write a second patch that applies these ideas to
scalar inequality while also handling MCV's correctly.

> I also agree with the upthread criticism that having two identical
> functions in different source files will be a maintenance nightmare.
> Don't do it. When and if there's a reason for the behavior to
> diverge between the range and multirange cases, it'd likely be
> better to handle that by passing in a flag to say what to do.

The duplication is indeed not ideal. However, there are already 8 other
duplicate functions between the two files.
I would thus suggest to leave the duplication in this patch and create a
second one that removes all duplication from the two files, instead of
just removing the duplication for our new function.
What are your thoughts on this? If we do this, should the function
definitions go in rangetypes.h or should we create a new
rangetypes_selfuncs.h header?

> But my real unhappiness with the patch as-submitted is the test cases,
> which require rowcount estimates to be reproduced exactly.

> We need a more forgiving test method. Usually the
> approach is to set up a test case where the improved accuracy of
> the estimate changes the planner's choice of plan compared to what
> you got before, since that will normally not be too prone to change
> from variations of a percent or two in the estimates.

I have changed the test method to produce query plans for a 3-way range
join.
The plans for the different operators differ due to the computed
selectivity estimation, which was not the case before this patch.

Regards,
Maxime Schoemans

Attachment Content-Type Size
v3-0001-Join-Selectivity-Estimation-for-Range-types.patch text/x-patch 61.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-11-20 20:37:46 Re: Add recovery to pg_control and remove backup_label
Previous Message Andres Freund 2023-11-20 20:01:09 Re: Annoying build warnings from latest Apple toolchain