| From: | Haibo Yan <tristan(dot)yim(at)gmail(dot)com> |
|---|---|
| To: | SCHOEMANS Maxime <maxime(dot)schoemans(at)ulb(dot)be> |
| Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SAKR Mahmoud <mahmoud(dot)sakr(at)ulb(dot)be>, Diogo Repas <diogo(dot)repas(at)gmail(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
| Subject: | Re: Implement missing join selectivity estimation for range types |
| Date: | 2026-04-16 04:12:35 |
| Message-ID: | CABXr29HuYbrtrThYaE8GYy-wMxtej8sG4Hw+2oSfQvuM+c0XVw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Apr 15, 2026 at 8:13 AM SCHOEMANS Maxime <maxime(dot)schoemans(at)ulb(dot)be>
wrote:
> Hi Haibo,
>
> Thank you for the review.
>
> > One small note: the localized bool join_is_reversed; in my version was
> > intentional. I left it that way because get_join_variables() wants a
> > storage location, and I preferred to keep that use local and explicit
> > rather than trying to reshape things around it.
>
> Fair point. I moved it out of the bare block because it looked unusual,
> but I can change it back if you prefer.
>
> > For patch 2, I am less convinced, especially for &&.
> > [...]
> > for multiranges there is a third possibility: neither side is entirely
> > left nor entirely right, and yet they still do not overlap because of
> > an internal gap.
>
> This is a valid concern, but it is an existing limitation of multirange
> statistics, not something we are introducing. The existing restriction
> selectivity code in multirangetypes_selfuncs.c already uses the same
> NOT(<<) AND NOT(>>) decomposition for && on multiranges. And
> multirange_typanalyze explicitly says:
>
> /* Treat multiranges like a big range without gaps. */
>
> The statistics only store the outermost bounds, so the gap information
> is already lost before our estimator sees it. The multirange GiST
> opclass does the same (stores the bounding range). Our join estimator
> is just consistent with how multiranges are handled elsewhere.
>
> The alternative is falling back to the 0.005 default, which will almost
> certainly be worse. Would a comment explaining the limitation be enough?
>
Thanks, that is a fair point.
I agree that this is not something patch 2 is uniquely introducing. If the
existing multirange statistics and restriction selectivity already treat a
multirange essentially as its outer bounds, then it makes sense that the
join estimator can only work within that same approximation.
So I am less worried about this as a correctness objection than I was at
first. My main concern is really about making that limitation explicit,
especially for &&, where internal gaps can matter a lot for the real
overlap semantics.
I think it would help if patch 2 said this a bit more directly, both in
the code comments and in the patch description. Something along the lines
of:
- this reuses the same outer-bounds approximation already used by
existing
multirange statistics / restriction selectivity
- internal gaps are not represented in the available stats
- so && for sparse multiranges may still be overestimated in some cases
- but this is still expected to be better than falling back to a fixed
default selectivity
> For patch 3, I agree with the motivation [...] But I am not convinced
> > that exporting those helpers via selfuncs.h is the right boundary.
> > My preference would be something tighter: [...] a backend-private
> > internal header just for the range-family selfuncs code
>
> Good point about the visibility. I'll move the declarations to a
> separate backend-private header in the next version.
>
> Regards,
> Maxime
>
If you are willing to add that clarification, I think that would address
most of my concern here.
Regards,
Haibo
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bharath Rupireddy | 2026-04-16 05:03:33 | Re: Introduce XID age based replication slot invalidation |
| Previous Message | Fujii Masao | 2026-04-16 04:11:52 | Re: Use XLogRecPtrIsValid() instead of negated XLogRecPtrIsInvalid |