Re: Add expressions to pg_restore_extended_stats()

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: Add expressions to pg_restore_extended_stats()
Date: 2026-02-05 10:21:23
Message-ID: aYRvI45hkV5oKvIC@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 04, 2026 at 11:42:16PM -0500, Corey Huinker wrote:
>> v5-0001 still requires a lot of changes at quick glance, as of:
>> - Some diffs not required in some comment blocks.
>
> Kept the s/stxdexprs/stxdexpr, as that is the correct column name, the
> other changes are reverted.

As in the attribute of pg_statistic_ext_data. Fixed this one
separately.

> We did need to do this, and it has been implemented, along with a new test
> on a new statistics object which has a to_tsvector(text_column) expression
> in it.

Thanks. I guess that makes more sense now. The coverage is good to
have.

> Multirange types in expressions generate exactly 2
> stakinds: STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM (6)
> and STATISTIC_KIND_BOUNDS_HISTOGRAM (7), neither of which are exposed by
> pg_stats_ext_exprs, so they can't be imported, so at present there is
> nothing to test and we can skip the multirange step-down for now.
> Inspecting the stavalues1 and stavalues2 values in pg_statistic_ext_data
> gives me the impression that we don't need the step-down, as the values
> appear to be regular range types, and the exiting function to get the
> element type should suffice.
>
> We should probably modify the view to include these new-ish statistic
> kinds, but that's for another patch.

Hmm. Okay. I'll study more this point.

> Moving to the found[] and vals[] arrays cleans up the logic a lot, I think,
> which is all the more important now that we keep building the expressions
> after components of that statistic fail.

v6-0001 has less fuzz, thanks for cleaning up the whole. I am looking
at the patch, and immediately noted two concepts that bump into my eyes
and look rather non-reliable.

+ if (!sta_ok)
+ *exprs_is_perfect = false;
+ isnull = false;

This bit looks incorrect to me? If !sta_ok (or !row_is_perfect in
import_pg_statistic()), then why is a non-NULL value inserted into the
resulting array? If we fail to parse even one field or miss one key,
we should force NULL for this single expression in the worst case and
attempt to move on with the rest. But it does not matter anyway
because import_expressions() would fail the import. Why don't we just
skip the rest of the expressions then? We know that the resulting
array will go to the garbage bit and that the restore failed, issuing
a WARNING for the statext object.

I think that import_expressions() has its logic going backwards, by
this I mean that intializing exprs_is_perfect to true could be risky.
It seems to me that we should do the exact opposite: initialize it at
false, and switch to true *if and only if* all the correct conditions
we want are reached. I'd suggest set of gotos and a single exit path
at the end of import_expressions() where exprs_is_perfect is set to
true to let the caller know that the expression can be safely used.
Remember import_mcv(), as one example.

Similarly, the same concept should be applied to import_pg_statistic().
row_is_perfect should be false to start, and switched to true once we
are sure that everything we want is valid.

Also, I think that the format of the dump should be better when it
comes to a set of expressions where some of them have invalid data.
Even if the stats of an expression are invalid, pg_dump builds a JSON
blob for the element of the expression with all the keys and their
values set to NULL. I'd suggest to just publish a NULL for the
element where invalid stats are found. That makes the dumps shorter
as well.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-02-05 10:22:19 Re: Truncate logs by max_log_size
Previous Message Nitin Motiani 2026-02-05 09:56:02 [PATCH] Support reading large objects with pg_read_all_data