Re: Implement missing join selectivity estimation for range types

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Schoemans Maxime <maxime(dot)schoemans(at)ulb(dot)be>
Cc: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SAKR Mahmoud <mahmoud(dot)sakr(at)ulb(dot)be>, Diogo Repas <diogo(dot)repas(at)gmail(dot)com>, LUO Zhicheng <zhicheng(dot)luo(at)ulb(dot)be>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Implement missing join selectivity estimation for range types
Date: 2023-11-14 19:46:21
Message-ID: 2516071.1699991181@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Schoemans Maxime <maxime(dot)schoemans(at)ulb(dot)be> writes:
> You can find attached a new version of the patch that can be applied on
> the current master branch of postgres.

I took a brief look through this very interesting work. I concur
with Tomas that it feels a little odd that range join selectivity
would become smarter than scalar inequality join selectivity, and
that we really ought to prioritize applying these methods to that
case. Still, that's a poor reason to not take the patch.

I also agree with the upthread criticism that having two identical
functions in different source files will be a maintenance nightmare.
Don't do it. When and if there's a reason for the behavior to
diverge between the range and multirange cases, it'd likely be
better to handle that by passing in a flag to say what to do.

But my real unhappiness with the patch as-submitted is the test cases,
which require rowcount estimates to be reproduced exactly. We know
very well that ANALYZE estimates are not perfectly stable and tend to
vary across platforms. As a quick check I tried the patch within a
32-bit VM, and it passed, which surprised me a bit ... but it would
surprise me a lot if we got these same numbers on every machine in
the buildfarm. We need a more forgiving test method. Usually the
approach is to set up a test case where the improved accuracy of
the estimate changes the planner's choice of plan compared to what
you got before, since that will normally not be too prone to change
from variations of a percent or two in the estimates. Another idea
could be something like

SELECT (estimate/actual BETWEEN 0.9 AND 1.1) AS ok FROM ...

which just gives a true/false output instead of an exact number.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-14 19:47:42 Re: Why do indexes and sorts use the database collation?
Previous Message Jeff Davis 2023-11-14 19:28:50 Re: Why do indexes and sorts use the database collation?