Re: Add expressions to pg_restore_extended_stats()

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: Add expressions to pg_restore_extended_stats()
Date: 2026-02-04 04:58:41
Message-ID: aYLSAeA0RQ8ZKclf@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 03, 2026 at 11:05:58PM -0500, Corey Huinker wrote:
>> So depending on what ANALYZE thinks in terms of what data is valid or
>> not, it is entirely possible that we don't insert any data at all for
>> some of the expressions.
>
> +1
>
> "valid" in this case could also mean "totally inconclusive and therefore
> not worth storing".

Yeah, I guess so in the context of ANALYZE.

My initial read of statext_expressions_load() was actually a bit
wrong, examine_variable() will always go through the load(), but if
serialize_expr_stats() has skipped one expression due to invalid data,
we just store a "(Datum) 0" aka NULL in the array. So there is some
data, and that also means that we will always have a number of
elements equal to the number of expressions (accumArrayResult adds the
value). Sorry for the confusion!

It also means that it should be possible to write
import_pg_statistic() so as it does the same thing as the
serialization. How about just allowing a NULL in the JSON array as an
equivalent of no data found for an expression? If you do that, your
initial idea about requiring the same number of elements in the array
would be consistent with the backend, without having to include any
information about the expression text or its attnum in the elements of
the JSON array.

For example, if a statext object has three expressions, the first and
second ones have no data but not the third, then we go like that:
[null,null,{"avg_width": "7", ... }]

A NULL value is appended for the first two ones in the result
ArrayBuildState to mark them as not valid, and the third one has valid
data. When loading this data statext_expressions_load() will see the
"invalid" value and act accordingly.

That would require a couple of extra lines in
import_pg_statistic(), so as accumArrayResult() could be called with
disnull=true to do the same as the backend, with "ok" set to true to
count for the fact that it is a valid pattern.

> And before you ask, there is *nothing* in pg_statistic_ext_data that
> indicate which expression it belongs to in the pg_statistic_ext beyond its
> position within the array, and we now know that isn't guaranteed.

Yeah, I wanted to rely on something external, but it does not seem we
can get that, so. Your idea to rely on the number of elements in the
array to be the same as the number of expressions would enforce that.
If the MCV inputs complain, well, at least the user would get some
information about what is wrong.

> - An element in the input array should have all its key/value pairs
>> set, if set for an expression.
>>
>
> That won't fly, because there will be pg_dumps from prior versions that
> won't know about future keys which would then be required.

Okay, point taken.

> It does, but I'm not seeing how we line up the right stats to the right
> element in the event of an undercount. I've been digging around in the
> planner, and it seems like it just takes the
> Anum_pg_statistic_ext_data_stxdexpr, expands it, and then
> statext_expressions_load just assumes that the array will have something at
> that index. I hope I'm misreading this, because we may have uncovered
> something broken.

It looks like your take is right and that my first impression was
wrong: if an expression has invalid data accumArrayResult() stores a
NULL value, so we will never be short. So there is some data, and
your patch could be tweaked so as an expression in a set is marked as
invalid by storing a NULL in the result array at its location.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Soumya S Murali 2026-02-04 05:00:27 Re: Fix how some lists are displayed by psql \d+
Previous Message Tender Wang 2026-02-04 04:54:58 Re: Convert NOT IN sublinks to anti-joins when safe