Quick Links

Re: Extended Statistics set/restore/clear functions.

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-hackers(at)lists(dot)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject:	Re: Extended Statistics set/restore/clear functions.
Date:	2025-11-17 17:18:55
Message-ID:	CADkLM=ei_07zDAyTrabZRc1tbRwzWLBd5yaP1zzqBTXxpvS-SA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Nov 17, 2025 at 1:56 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Fri, Nov 14, 2025 at 03:25:27PM +0900, Michael Paquier wrote:
> > Thanks for the new versions, I'll also look at all these across the
> > next couple of days. Probably not at 0005~ for now.
>
> 0001 and 0002 from series v13 have been applied to change the output
> functions.
>

Thanks!

Though, I was thinking some more about the output format. Using
jsonb_pretty() makes it readable in one way, and very clumsy in other ways.
Instead, I'm going to try doing the following:

replace (ndist_string_value, '},', E'}\n,')

This will result in the output value being formatted in exactly the way
described in the commit messages.

Of course, we could make the the actual default format by changing
appendStringInfoString(&str, ", ") instead.

One might argue that the output shouldn't get too flowery, but we're
already adding spaces between items and array elements, and we've already
made extensive changes favoring readability over compactness.

Review of changes to input function patches:

I'm curious about the re-parameterization of error messages involving
PG_NDISTINCT_KEY_ATTRIBUTES, PG_NDISTINCT_KEY_NDISTINCT, and similar keys
in dependencies. I like the parameterized version better, and was confused
as to why it was removed. Did you change your mind, or was it done for ease
of translation?

Wording changes all look good. Use of pg_input_error_info() makes sense.

>
> There is an extra thing that bugs me as incorrect for the pg_ndistinct
> input, something I have not tackled myself yet. Your patch checks
> that subsets of attributes are included in the longest set found, but
> it does not match the guarantees we have in mvndistinct.c: we have to
> check that *all* the combinations generated by generator_init() are
> satisfied based on the longest of attributes detected. For example,
> this is thought as correct in the input function:
> SELECT '[{"attributes" : [-1,2], "ndistinct" : 1},
> {"attributes" : [-1,2,3], "ndistinct" : 3}]'::pg_ndistinct;
>
> However it is obviously not correct as we are missing an element for
> the attributes [-1, 3]. The simplest solution would be to export the
> routines that generate the groups now in mvndistinct.c. Also we
> should make sure that the number of elements in the arrays match with
> the number of groups we expect, not only the elements. I don't think
> that we need to care much about the values, but we ought to provide
> stronger guarantees for the attributes listed in these elements.
>

I had a feeling that was going to be requested. My question would be if
that we want to stick to modeling the other combinations after the first
longest combination, last longest, or if we want to defer those checks
altogether until we have to validate against an actual stats object?

>
> Except for this argument, the input of pg_ndistinct feels OK in terms
> of the guarantees that we'd want to enforce on an import. The same
> argument applies in terms of attribute number guarantees for
> pg_dependencies, based on DependencyGenerator_init() & friends in
> dependencies.c. Could you look at that?
>

Yes. I had already looked at it to verify that _all_ combinations were
always generated (they are), because I had some vague memory of the
generator dropping combinations that were statistically insignificant. In
retrospect, I have no idea where I got that idea.

>
> For pg_dependencies, we also require some checks on the value for
> "dependency", of course, making sure that this matches with what's
> expected with the "largest" sets of attributes. In this case, we need
> to track the union of "dependency" and "attributes", with "attributes"
> having at least one element.
>

This is fairly simple to do. The dependency attnum is just appended to the
list of attnums, and the combinations are generated the same as ndistinct,
though obviously there are no single elements.

There's probably some common code between the lists to be shared, differing
only in how they report missing combinations.

> The tests of pg_dependencies need also to be extended more (begun that
> a bit, far from being complete and I'm lacking of time this week due
> to a conference). One thing that I would add are nested JSON objects
> in the paths where we expect values, for example. Please note that I
> have done a brush of 0004, while on it, cleaning up typos,
> inconsistencies and making the error codes consistent with the
> ndistinct case where possible. This is not ready, but that's at least
> it's a start to rely on.
>

>
> In terms of committable bits, it would be better to apply the input
> functions once both parts are ready to go. For now I am attached a
> v14 with the work I've put into them. 0005~ are not reviewed yet, as
> mentioned previously. The changes in pg_dependencies are actually
> straight-forward to figure out (well, mostly) once the pg_ndistinct
> changes are OK in shape.
> --
> Michael

In response to

Re: Extended Statistics set/restore/clear functions. at 2025-11-17 06:56:06 from Michael Paquier

Responses

Re: Extended Statistics set/restore/clear functions. at 2025-11-17 23:19:58 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2025-11-17 17:28:34	Re: Adding basic NUMA awareness
Previous Message	ocean_li_996	2025-11-17 17:18:12	minor improvement in snapbuild: use existing interface and remove fake code