Re: Add expressions to pg_restore_extended_stats()

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: Add expressions to pg_restore_extended_stats()
Date: 2026-02-03 03:34:14
Message-ID: CADkLM=fOmCRxb-ssN+e1ZGTmsL-wTzHDD4a012rL=fUYAz-cfg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
> Not including a trace regarding to which expression a row refers to
> sounds like a design mistake to me, particularly because JSON is, by
> design, JSON, and we don't have ordering requirements.

I'd have no issue including the expr definition in the existing object
container - an optional parameter that allow (no warning) but which has no
effect. We could then compare that optional parameter against the text
representation of the node and issue a non-failing warning on any
differences.

But in a way we're already getting this type checking if the expressions
have different datatypes and either of them have most_common_values,
histogram_bounds, or most_common_elems - as all of those require input
coersion of the array values to datatypes of the expressions, and any
failure in any of those dooms the whole exprs array.

It's true that json jumbles orders of keys, but it does not jumble the
order of elements inside an array. The 2-D text[] export has exactly this
same problem, btw.

> If we don't
> include an expression text, I'm OK to give up on this idea. But let's
> at least include a negative attribute number with an "attribute"
> field.

That feels like the same thing as including the expr definition as a
parameter, and I'm as amenable to this as I am to the expr node text.

> We could cross-check it with the number of expressions defined
> in the statext object.
>

Yep.

>
> On second though, as we already use negative attribute numbers for
> ndistinct and dependencies, perhaps it's not a bad choice to use a
> negative number anyway. As the attribute number assigned depends on
> the order of the elements in pg_stats_ext.exprs, I'd suggest to tweak
> the pg_dump query to rely on that rather than ORDINALITY and the order
> where the rows of pg_stats_ext_exprs are scanned. Using the order of
> the elements in the definition of the stats object is predictable. A
> sequential scan of a catalog offers no real guarantees.
>

This has come up before. pg_stats, infuriatingly, has an attname but not an
attnum.

pg_stats_ext_exprs, equally infuriatingly, does not expose
pg_statistic_ext.oid, which means to join back from pg_stats_ext_exprs to
pg_statistic_ext, we'd have to re-join the namespace and...yeah, it gets
yucky.

But instead of doing that, we could fetch
the pg_get_statisticsobjdef_expressions() when we first fetch the extstats
definitions, use that as a parameter in the dump query, and unnest that
with ordinality and join back to pg_stats_ext_expr. I think that's the best
way of enforcing the order of the expressions and not just trusting the
unnest() in the view definition.

In other news, pg_get_statisticsobjdef_expressions appears to be
undocumented, though it exists in v14 when extended stats expressions were
introduced, so I think we're safe there.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-02-03 03:37:36 Re: walsender: Assert MyReplicationSlot is set before use
Previous Message Michael Paquier 2026-02-03 03:27:49 Re: IO wait events for COPY FROM/TO PROGRAM or file