| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me> |
| Subject: | Re: Add expressions to pg_restore_extended_stats() |
| Date: | 2026-02-04 01:55:47 |
| Message-ID: | aYKnI-1eViZVwbIS@paquier.xyz |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Feb 03, 2026 at 03:27:53AM -0500, Corey Huinker wrote:
>> This should also lead to better error messages on restore depending on
>> the expression data we expect (I'll check ANALYZE about the
>> possibility of partial generation of the expression stats data
>> tomorrow or so).
>
> compute_expr_stats() is a bit confusing. There's a tcnt counter that looks
> like it was meant to be independent of the loop variable, but currently
> there are no ways to skip setting the datum at the tnct index and then
> incrementing tcnt. I checked the history, and it's like that all the way
> back to the introduction of expression stats in a4d75c86bf1522.
I have finally spent some time studying a bit more how these stats are
computed, inserted into the catalogs by analyze, then loaded. Indeed
the we have a computation first of the numbers in
compute_expr_stats(), all that being introduced by a4d75c86bf1522.
The code paths that catch my attention is how the data is inserted,
though, as in serialize_expr_stats(), and we have the following thing
that skips the insertion of a pg_statistic tuple for an expression
with invalid data (I thought this should be the case, now I am sure it
is the case):
if (!stats->stats_valid)
{
astate = accumArrayResult(astate,
(Datum) 0,
true,
typOid,
CurrentMemoryContext);
continue;
}
So depending on what ANALYZE thinks in terms of what data is valid or
not, it is entirely possible that we don't insert any data all all for
some of the expressions.
When the expression data is loaded in examine_variable()@selfuncs.c,
the code expects to find a valid tuple each time it calls
statext_expressions_load(). There is a loop before doing the load
to make sure that there is any stats data available.
That pretty much answers the previous question I had that I could not
answer to yet a few mails ago: the restore function needs to be
flexible with the restore of expressions, and it looks like we should
not expect all the inputs to be set. It is OK to not restore any data
at all for some of the expressions, or skip the all set of expressions
if nothing is given in input because we found no expression stats data.
As far as I can see, the proposed patch is incorrect and inconsistent
with the backend regarding all that. For example:
CREATE TABLE test (name text);
CREATE STATISTICS stats_exprs (dependencies)
ON lower(name), upper(name)
FROM test;
So based on my read of the code, we could expect zero, one or two
pg_statistic tuples based on what ANALYZE thinks. The restore
functions expects always two elements in its inner array. IMO, we
should have the flexibility to pass down 0, 1 or 2 elements in this
1-D array, without failing as long as the elements of the input data
have valid data.
The patch allows an object pattern like this one, for example, with a
second object being empty:
'exprs', '[{"avg_width": "7", [all valid fields] .. }, {}]
Unfortunately, on such input we insert TWO pg_statistic tuple. We
could rely on the order of the items in the 1-D array, but it seems to
me that we will have a much easier life if we shape the input data
based on the following rules, matching with the policy that ANALYZE
can allow:
- An input array can have as many elements as the number of
expressions.
- The expression a single expression refers ought to be tracked in
each element, individually. This means at least the addition of a
negative attribute number in EACH element (I am entirely rejecting my
wilder idea of the expression text at this stage). If the "exprs"
data has no element for a given expression, that's fine, this has the
same meaning as no tuples found in pg_statistic for an expression.
- An element in the input array should have all its key/value pairs
set, if set for an expression.
Does this analysis make sense to you?
--
Michael
| From | Date | Subject | |
|---|---|---|---|
| Next Message | wangpeng | 2026-02-04 01:56:45 | Re: pg_dumpall --roles-only interact with other options |
| Previous Message | zengman | 2026-02-04 01:50:52 | Re: Remove unused isCommit parameter from AtEOXact_LocalBuffers |