| From: | Enrique Sánchez <enriqueesanchz(at)gmail(dot)com> |
|---|---|
| To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
| Cc: | Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
| Subject: | Re: Extended statistics improvement: multi-column MCV missing values |
| Date: | 2026-06-10 00:57:29 |
| Message-ID: | CAOCkzAmKBdQucXb=5nnYSa2wzap+duRNFdW94066F6=KH5gDTA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi, thank you for the feedback!
On 9/7/2026 14:39, Ilia Evdokimov (<ilya(dot)evdokimov(at)tantorlabs(dot)com>) wrote:
> 1. mcv_can_cap() reimplements logic already present in
> dependency_is_compatible_clause(). Shall we combine the two in order to
> avoid code duplication?
>
> It's true that some of the checks that are in `mcv_can_cap()` are already
in `dependency_is_compatible_clause()`. They both return the same value
when there's a pseudo-constant and the clause is `=`, `= TRUE` or `=
FALSE`. Rest of the cases are different (IS NULL, ANY/IN, OR).
We would need to add surrounding logic because even for the matching cases,
`mcv_can_cap()` doesn't need the pseudo-constant check that
`dependency_is_compatible_clause()` requires for the `=` branch, among
other checks.
We won't gain much from combining both: the number of removed lines is
compensated with the added surrounding logic; and things will be more
convoluted.
Moreover, the two functions will evolve independently because they serve
different stats, so coupling them would create undesirable
cross-dependency. I would prefer to maintain it as it is.
2. mcv_can_cap() runs unconditionally before the if (is_or) branch, but
> can_cap is only consumed in the else/AND branch. This means mcv_can_cap() -
> including get_oprrest, syscache lookups per clause - runs for every OR
> query as wasted work. Both can_cap and covered_attnums should be moved
> inside the else branch.
>
Makes sense. I've modified it, you can check v4-0001. It only made sense to
have it there if we implement the OR path, which might be done in a future
patch.
get_ndistinct_for_keys() reimplements the ndistinct item lookup already
> present in estimate_multivariate_ndistinct(). Both functions iterate over
> MVNDistinct->items match by attributes count.
>
Right! I've created a helper and used it in both places. You can check
v4-0002.
I've checked that the patch applies cleanly and pg-ci.yml passes.
Best regards,
Enrique.
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0002-Use-ndistinct-to-cap-non-MCV-values.patch | text/x-patch | 12.7 KB |
| v4-0001-Cap-selectivity-when-values-are-not-in-multi-colu.patch | text/x-patch | 11.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Michael Paquier | 2026-06-10 00:21:25 | Re: pg_plan_advice |