Re: Extended Statistics set/restore/clear functions.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Extended Statistics set/restore/clear functions.
Date: 2026-01-20 20:48:01
Message-ID: CADkLM=dJ4bQ3e4Cc+vaVFmnrL37Y5hPJ1u0AYXE6fSkh_=+sTg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> The patch set has a clear deficiency in test coverage: a lot of code
> where we expect failures are totally uncovered, particularly things
> related to MCV lists and parameters, MCV array checks, dimensions,
> incorrect number of elements, expression imports, etc. There is a
> large number of them. I'd expect all of these cases to have proper
> coverage, with WARNINGs generated as we don't want hard failures on
> these calls, do we? Some more validation for ndistinct and
> dependencies would be welcome, as well.
>

Indeed, I've identified 29 WARNINGs that need direct coverage, and that's
not counting the ndistinct and dependencies, though for those I think we
can limit ourselves to testing ways in which those values can fail the
validation against the extended stats object rather than covering all the
ways that a value can be invalid in a vacuum.

In doing so, I noticed that we have two ways of handling some of this
warning-waterfall.

The first is to have a function call that emits some warnings and flips a
boolean:

datum = some_function(input, input, &ok);
if (!ok)
...

And the second is to directly test the datum, which is fine because the
expected datum output is never null:

datum = some_function(input, input);
if (datum == (Datum) 0)
...

There seems to be more precedent for the second pattern, but I thought I'd
bring it up for discussion before switching to that pattern wherever
possible.

> This has to be stable, meaning that we *must* be able to handle and
> reject trash input data, and also relates to the lack of coverage
> overall. I am pretty sure that if I begin to inject more buggy values
> in other areas of these parameters, things could get funky.
>

Agreed.

> I am wondering if we should not cut the pie into two parts here: the
> dependencies and ndistinct data is actually so much more
> straight-forward as they have predictive input patterns that we can
> discard easiy if they are incorrect. The MCV list part with its
> import logic and cross-validation of the inputs is something else
> based on the attribute types. I don't think that we are far from a
> good solution, still that would be one strategy to get something done
> if we cannot get the MCV part completely right.
>

I'm not sure what cutting the pie would mean in terms of code
reorganization, so I can't comment on whether I think it's a good idea. For
the time being I'm going to add the tests in stats_import.sql, and if it
gets unweildly then we can consider splitting that.

>
> A lot of the code is still largely under-documented, with zero
> explanation about pg_restore_extended_stats(), no explanation about
> the individual fields that can be set in pg_restore_extended_stats().
> It's impossible to understand for a reader what statext_mcv_import()
> does based on its inputs and what a caller can give in input.
>

Noted.

> least what they refer to in the catalogs. Having this stuff implied
> in the code with zero reference in the documentation is like driving a
> car blind, and I doubt it's good for one's health.
>

If anyone could come up with a means of safely driving via echolocation, it
would be you. But let's not let it come to that. Thanks for the feedback. I
can post an intermediate v29 to appease the cfbot, but if there's no demand
then I'll wait til I have the test cases and documentation in place.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Manni Wood 2026-01-20 20:48:58 Re: Speed up COPY FROM text/CSV parsing using SIMD
Previous Message Konstantin Knizhnik 2026-01-20 20:45:44 Re: Mystery with REVOKE PRIVILEGE