Re: Extended statistics improvement: multi-column MCV missing values

From: Enrique Sánchez <enriqueesanchz(at)gmail(dot)com>
To: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Subject: Re: Extended statistics improvement: multi-column MCV missing values
Date: 2026-06-07 15:22:41
Message-ID: CAOCkzAnYgzn0ZN4KUPngyiHQdcKHbF=KAxmCR-ObJVKirQvVLw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

1. Full-dimensional top-level AND equality miss, using the
> ndistinct-based average estimate when matching ndistinct statistics
> exist, with the least-MCV frequency as an upper bound; otherwise
> using the cap alone.

I've implemented the ndistinct-based estimation and attached it as
v3-0002. The patch applies cleanly and pg-ci.yml passes.

## v3-0001
- logic remains the same
- added a `IsA(clause, RestrictInfo)` check before casting to RestrictInfo
in `mcv_can_cap()`
- added some tests

## v3-0002
- used ndistinct to calculate `non_mcv_sel = (1.0 - mcv_totalsel) /
(ndistinct - mcv_nitems)` and applied it as an upper bound for non-MCV
combinations, similarly to what `var_eq_const()` does in `selfuncs.c`

Looking forward to your feedback!

Best regards,
Enrique.

Attachment Content-Type Size
v3-0002-Use-ndistinct-to-cap-non-MCV-values.patch text/x-patch 9.1 KB
v3-0001-Cap-selectivity-when-values-are-not-in-multi-colu.patch text/x-patch 11.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Junwang Zhao 2026-06-07 15:13:07 Re: Copy from JSON FORMAT.