Re: Implement missing join selectivity estimation for range types

From: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
To: SCHOEMANS Maxime <maxime(dot)schoemans(at)ulb(dot)be>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SAKR Mahmoud <mahmoud(dot)sakr(at)ulb(dot)be>, Diogo Repas <diogo(dot)repas(at)gmail(dot)com>, LUO Zhicheng <zhicheng(dot)luo(at)ulb(dot)be>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Implement missing join selectivity estimation for range types
Date: 2026-04-15 00:53:31
Message-ID: CABXr29Gts+FuMRkeA=JNAxB700W1EQW2F=W2wAjtJrGZDGTxGw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 14, 2026 at 7:03 AM SCHOEMANS Maxime <maxime(dot)schoemans(at)ulb(dot)be>
wrote:

> Hi Haibo,
>
> Thank you for picking this up again. I agree with the changes you made
> in v5, in particular scoping the patch to the three strict operators and
> reworking the tests to check plan structure rather than exact row counts.
>
> Attached is v6 as a 3-patch series building on your v5.
>
> Patch 1 is your range join selectivity patch with one small change: the
> range_cmp_bounds result in the merge walk is stored in a local cmp
> variable to avoid calling it twice per iteration, as jian he suggested.
>

Hi Maxime,
Thank you for working on this and for building on top of v5.

I think patch 1 looks good to me. I do not see major issues there. One
small note: the localized bool join_is_reversed; in my version was
intentional. I left it that way because get_join_variables() wants a
storage location, and I preferred to keep that use local and explicit
rather than trying to reshape things around it.

>
> Patch 2 adds the same estimation for multirange types, covering all type
> combinations (multirange x multirange, multirange x range, range x
> multirange). Since both range and multirange types use the same bound
> histogram format and the same RangeBound representation, the core
> algorithm is identical.
>
For patch 2, I am less convinced, especially for &&.

My concern is not so much whether the code runs, but whether the semantic
argument is strong enough for the mixed range/multirange cases. The patch
assumes that because range and multirange use the same bounds histogram
format and the same RangeBound representation, the same estimator can be
applied directly. I think that argument is much easier to make for << and >>
than for &&.

For single ranges, && works nicely with the usual decomposition because if
two single ranges do not overlap, then one must be entirely to the left or
entirely to the right of the other. But for multiranges there is a third
possibility: neither side is entirely left nor entirely right, and yet they
still do not overlap because of an internal gap.
For example:

A = {[1,2), [100,101)}

B = [50,60)

Here:

A << B is false
A >> B is false
A && B is also false

So for multiranges, “not left and not right” does not imply overlap in the
same way it does for single ranges. That makes me worry that reusing the
same estimator logic for &&, especially in mixed range/multirange cases,
may overestimate overlap because the overall lower/upper bounds do not
capture the internal holes.

So I think patch 2 still needs a stronger justification there, and probably
more targeted tests around sparse multiranges / hole cases, especially for
&&.

>
> Patch 3 removes the duplication between rangetypes_selfuncs.c and
> multirangetypes_selfuncs.c that Tom raised as a concern. It makes the
> 10 shared helper functions non-static, exports them via selfuncs.h,
> and deletes the copies from the multirange file. This covers all the
> pre-existing duplication between the two files, not just the functions
> added in this patch set.
>
> Regards,
> Maxime
>
For patch 3, I agree with the motivation: the duplication between
rangetypes_selfuncs.c and multirangetypes_selfuncs.c is not ideal. But I am
not convinced that exporting those helpers via selfuncs.h is the right
boundary.

My preference would be something tighter:

-

keep the shared helper implementations in one place
-

add a backend-private internal header just for the range-family selfuncs
code
-

include that internal header from rangetypes_selfuncs.c and
multirangetypes_selfuncs.c
-

avoid widening visibility by turning a group of file-local helpers into
broader extern declarations in selfuncs.h

In other words, I agree that the duplication should be removed, but I think
a backend-private internal header should be enough for that goal. I do not
think we need to expand visibility more than necessary by moving these
helpers out of the file-private space into a broader interface.

Thanks again for working on this.

Regards,

Haibo

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-04-15 00:56:02 Re: [PATCH] Miscellaneous little fixes
Previous Message Richard Guo 2026-04-15 00:51:03 Re: var_is_nonnullable() fails to handle invalid NOT NULL constraints