| From: | Enrique Sánchez <enriqueesanchz(at)gmail(dot)com> |
|---|---|
| To: | Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> |
| Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
| Subject: | Re: Extended statistics improvement: multi-column MCV missing values |
| Date: | 2026-06-07 15:22:41 |
| Message-ID: | CAOCkzAnYgzn0ZN4KUPngyiHQdcKHbF=KAxmCR-ObJVKirQvVLw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
1. Full-dimensional top-level AND equality miss, using the
> ndistinct-based average estimate when matching ndistinct statistics
> exist, with the least-MCV frequency as an upper bound; otherwise
> using the cap alone.
I've implemented the ndistinct-based estimation and attached it as
v3-0002. The patch applies cleanly and pg-ci.yml passes.
## v3-0001
- logic remains the same
- added a `IsA(clause, RestrictInfo)` check before casting to RestrictInfo
in `mcv_can_cap()`
- added some tests
## v3-0002
- used ndistinct to calculate `non_mcv_sel = (1.0 - mcv_totalsel) /
(ndistinct - mcv_nitems)` and applied it as an upper bound for non-MCV
combinations, similarly to what `var_eq_const()` does in `selfuncs.c`
Looking forward to your feedback!
Best regards,
Enrique.
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0002-Use-ndistinct-to-cap-non-MCV-values.patch | text/x-patch | 9.1 KB |
| v3-0001-Cap-selectivity-when-values-are-not-in-multi-colu.patch | text/x-patch | 11.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Junwang Zhao | 2026-06-07 15:13:07 | Re: Copy from JSON FORMAT. |