Re: Extended statistics improvement: multi-column MCV missing values

From: Enrique Sánchez <enriqueesanchz(at)gmail(dot)com>
To: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Cc: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Subject: Re: Extended statistics improvement: multi-column MCV missing values
Date: 2026-06-10 00:57:29
Message-ID: CAOCkzAmKBdQucXb=5nnYSa2wzap+duRNFdW94066F6=KH5gDTA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, thank you for the feedback!

On 9/7/2026 14:39, Ilia Evdokimov (<ilya(dot)evdokimov(at)tantorlabs(dot)com>) wrote:

> 1. mcv_can_cap() reimplements logic already present in
> dependency_is_compatible_clause(). Shall we combine the two in order to
> avoid code duplication?
>
> It's true that some of the checks that are in `mcv_can_cap()` are already
in `dependency_is_compatible_clause()`. They both return the same value
when there's a pseudo-constant and the clause is `=`, `= TRUE` or `=
FALSE`. Rest of the cases are different (IS NULL, ANY/IN, OR).

We would need to add surrounding logic because even for the matching cases,
`mcv_can_cap()` doesn't need the pseudo-constant check that
`dependency_is_compatible_clause()` requires for the `=` branch, among
other checks.
We won't gain much from combining both: the number of removed lines is
compensated with the added surrounding logic; and things will be more
convoluted.

Moreover, the two functions will evolve independently because they serve
different stats, so coupling them would create undesirable
cross-dependency. I would prefer to maintain it as it is.

2. mcv_can_cap() runs unconditionally before the if (is_or) branch, but
> can_cap is only consumed in the else/AND branch. This means mcv_can_cap() -
> including get_oprrest, syscache lookups per clause - runs for every OR
> query as wasted work. Both can_cap and covered_attnums should be moved
> inside the else branch.
>
Makes sense. I've modified it, you can check v4-0001. It only made sense to
have it there if we implement the OR path, which might be done in a future
patch.

get_ndistinct_for_keys() reimplements the ndistinct item lookup already
> present in estimate_multivariate_ndistinct(). Both functions iterate over
> MVNDistinct->items match by attributes count.
>
Right! I've created a helper and used it in both places. You can check
v4-0002.

I've checked that the patch applies cleanly and pg-ci.yml passes.

Best regards,
Enrique.

Attachment Content-Type Size
v4-0002-Use-ndistinct-to-cap-non-MCV-values.patch text/x-patch 12.7 KB
v4-0001-Cap-selectivity-when-values-are-not-in-multi-colu.patch text/x-patch 11.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Michael Paquier 2026-06-10 00:21:25 Re: pg_plan_advice