Re: Extended Statistics set/restore/clear functions.

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Extended Statistics set/restore/clear functions.
Date: 2026-01-21 00:00:51
Message-ID: aXAXMwHYF1qyUaP5@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 20, 2026 at 03:48:01PM -0500, Corey Huinker wrote:
> Indeed, I've identified 29 WARNINGs that need direct coverage, and that's
> not counting the ndistinct and dependencies, though for those I think we
> can limit ourselves to testing ways in which those values can fail the
> validation against the extended stats object rather than covering all the
> ways that a value can be invalid in a vacuum.

Yeah, that should be OK for these two. We should sufficient have
coverage for trash input based on the definition of the extstat
object, the input functions dealing with most of the details.

> In doing so, I noticed that we have two ways of handling some of this
> warning-waterfall.
>
> The first is to have a function call that emits some warnings and flips a
> boolean:
>
> datum = some_function(input, input, &ok);
> if (!ok)
> ...
>
> And the second is to directly test the datum, which is fine because the
> expected datum output is never null:
>
> datum = some_function(input, input);
> if (datum == (Datum) 0)
> ...
>
> There seems to be more precedent for the second pattern, but I thought I'd
> bring it up for discussion before switching to that pattern wherever
> possible.

Hmm. Not sure yet which one is better here, TBH. The "ok" pattern
feels a but better to me because a Datum == 0 could also be fine in
some cases, be they existing of future cases, so that's more flexible
by design.

>> I am wondering if we should not cut the pie into two parts here: the
>> dependencies and ndistinct data is actually so much more
>> straight-forward as they have predictive input patterns that we can
>> discard easiy if they are incorrect. The MCV list part with its
>> import logic and cross-validation of the inputs is something else
>> based on the attribute types. I don't think that we are far from a
>> good solution, still that would be one strategy to get something done
>> if we cannot get the MCV part completely right.
>
> I'm not sure what cutting the pie would mean in terms of code
> reorganization, so I can't comment on whether I think it's a good idea. For
> the time being I'm going to add the tests in stats_import.sql, and if it
> gets unweildly then we can consider splitting that.

When I review a patch and that I want to do a clean split, I tend to
apply the whole, then remove the parts that can be split out, reapply,
and then generate a diff of the previously-removed code. It's just
easier to delete code. For this patch set, it would mean removing the
MVC variables and related structures. It is actually not that bad
based on how you have structured the restore function; there is no
strong dependency between each code path treating each stxkind. This
is just thinking of possible strategy if this whole thing cannot make
it into the tree by the freeze deadline. Some of it may be better
than nothing of it, and it is possible to add more parameters later
with the right basics in place. For me, review and integration tends
to be easier as a incremental set of small-ish steps, but I think that
you are aware of my way of doing things :D
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2026-01-21 00:08:14 pg_stat_statements: add missing tests for nesting_level
Previous Message Michael Paquier 2026-01-20 23:35:43 Re: pg_stat_statements: Fix nested tracking for implicitly closed cursors