| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Extended Statistics set/restore/clear functions. |
| Date: | 2026-01-28 06:08:17 |
| Message-ID: | aXmn0biuVWTtu2TF@paquier.xyz |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jan 28, 2026 at 12:44:33AM -0500, Corey Huinker wrote:
> The short answer overcaution. The longer answer is we ingest it because
> pg_mcv_list() returns it (
> https://www.postgresql.org/docs/current/functions-statistics.html) and I
> assumed that it must be necessary for some corner-case input, otherwise why
> would pg_mcv_list() export it? In retrospect, its lack of presence in
> pg_stats view should have been a clue that we could get by without it. I'm
> happy to rip it out and submit another patchset.
It seems to me that this comes down to the text[] representation when
we read this data from the catalogs, where we can just rely on NULL
being in the value, and the official marker in this case:
https://www.postgresql.org/docs/current/arrays.html#ARRAYS-INPUT
That's what your code relies on already anyway when deconstructing the
input received from the restore function. Doubling on that with an
extra array in input is error-prone IMO, as the overread issue I've
spotted today is proving. The other two 1-dimension arrays for freqs
and base_freqs are of course needed, we cannot and should not rebuild
them on-the-fly..
For the in-core initial computation, statext_mcv_build() relies on the
data gathered in StatsBuildData during ANALYZE before building the set
of MCV data, where my guess is that it is just more useful to store
this state as-is, and it's been built based on the Datum lookups.
Perhaps we have a couple of specific cases where checking for we want
some NULL-ness knowledge? It would be less expensive than a full
array deconstruction, for sure, especially if the MCVs are large text
values.
--
Michael
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Corey Huinker | 2026-01-28 06:12:17 | Re: Extended Statistics set/restore/clear functions. |
| Previous Message | Amit Kapila | 2026-01-28 05:47:57 | Re: pgsql: Prevent invalidation of newly synced replication slots. |