Re: Implement missing join selectivity estimation for range types

From: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
To: SCHOEMANS Maxime <maxime(dot)schoemans(at)ulb(dot)be>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SAKR Mahmoud <mahmoud(dot)sakr(at)ulb(dot)be>, Diogo Repas <diogo(dot)repas(at)gmail(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Implement missing join selectivity estimation for range types
Date: 2026-04-16 04:12:35
Message-ID: CABXr29HuYbrtrThYaE8GYy-wMxtej8sG4Hw+2oSfQvuM+c0XVw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 15, 2026 at 8:13 AM SCHOEMANS Maxime <maxime(dot)schoemans(at)ulb(dot)be>
wrote:

> Hi Haibo,
>
> Thank you for the review.
>
> > One small note: the localized bool join_is_reversed; in my version was
> > intentional. I left it that way because get_join_variables() wants a
> > storage location, and I preferred to keep that use local and explicit
> > rather than trying to reshape things around it.
>
> Fair point. I moved it out of the bare block because it looked unusual,
> but I can change it back if you prefer.
>
> > For patch 2, I am less convinced, especially for &&.
> > [...]
> > for multiranges there is a third possibility: neither side is entirely
> > left nor entirely right, and yet they still do not overlap because of
> > an internal gap.
>
> This is a valid concern, but it is an existing limitation of multirange
> statistics, not something we are introducing. The existing restriction
> selectivity code in multirangetypes_selfuncs.c already uses the same
> NOT(<<) AND NOT(>>) decomposition for && on multiranges. And
> multirange_typanalyze explicitly says:
>
> /* Treat multiranges like a big range without gaps. */
>
> The statistics only store the outermost bounds, so the gap information
> is already lost before our estimator sees it. The multirange GiST
> opclass does the same (stores the bounding range). Our join estimator
> is just consistent with how multiranges are handled elsewhere.
>
> The alternative is falling back to the 0.005 default, which will almost
> certainly be worse. Would a comment explaining the limitation be enough?
>
Thanks, that is a fair point.

I agree that this is not something patch 2 is uniquely introducing. If the

existing multirange statistics and restriction selectivity already treat a

multirange essentially as its outer bounds, then it makes sense that the

join estimator can only work within that same approximation.

So I am less worried about this as a correctness objection than I was at

first. My main concern is really about making that limitation explicit,

especially for &&, where internal gaps can matter a lot for the real

overlap semantics.

I think it would help if patch 2 said this a bit more directly, both in

the code comments and in the patch description. Something along the lines

of:

- this reuses the same outer-bounds approximation already used by
existing

multirange statistics / restriction selectivity

- internal gaps are not represented in the available stats
- so && for sparse multiranges may still be overestimated in some cases
- but this is still expected to be better than falling back to a fixed
default selectivity

> For patch 3, I agree with the motivation [...] But I am not convinced
> > that exporting those helpers via selfuncs.h is the right boundary.
> > My preference would be something tighter: [...] a backend-private
> > internal header just for the range-family selfuncs code
>
> Good point about the visibility. I'll move the declarations to a
> separate backend-private header in the next version.
>
> Regards,
> Maxime
>

If you are willing to add that clarification, I think that would address

most of my concern here.

Regards,
Haibo

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2026-04-16 05:03:33 Re: Introduce XID age based replication slot invalidation
Previous Message Fujii Masao 2026-04-16 04:11:52 Re: Use XLogRecPtrIsValid() instead of negated XLogRecPtrIsInvalid