Re: Add expressions to pg_restore_extended_stats()

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: Add expressions to pg_restore_extended_stats()
Date: 2026-02-04 05:14:06
Message-ID: CADkLM=cuvJBNCQ7Ge7zrc52s_ci=g=FbXiOum3i86MCUpSNEDQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> My initial read of statext_expressions_load() was actually a bit
> wrong, examine_variable() will always go through the load(), but if
> serialize_expr_stats() has skipped one expression due to invalid data,
> we just store a "(Datum) 0" aka NULL in the array. So there is some
> data, and that also means that we will always have a number of
> elements equal to the number of expressions (accumArrayResult adds the
> value). Sorry for the confusion!
>

That is an enormous relief.

>
> It also means that it should be possible to write
> import_pg_statistic() so as it does the same thing as the
> serialization. How about just allowing a NULL in the JSON array as an
> equivalent of no data found for an expression? If you do that, your
> initial idea about requiring the same number of elements in the array
> would be consistent with the backend, without having to include any
> information about the expression text or its attnum in the elements of
> the JSON array.
>

Whew.

>
> That would require a couple of extra lines in
> import_pg_statistic(), so as accumArrayResult() could be called with
> disnull=true to do the same as the backend, with "ok" set to true to
> count for the fact that it is a valid pattern.
>

+1

> > - An element in the input array should have all its key/value pairs
> >> set, if set for an expression.
> >>
> >
> > That won't fly, because there will be pg_dumps from prior versions that
> > won't know about future keys which would then be required.
>
> Okay, point taken.
>

We also get *much* shorter regression tests if allow values irrelevant to
the test to be excluded.

>
> > It does, but I'm not seeing how we line up the right stats to the right
> > element in the event of an undercount. I've been digging around in the
> > planner, and it seems like it just takes the
> > Anum_pg_statistic_ext_data_stxdexpr, expands it, and then
> > statext_expressions_load just assumes that the array will have something
> at
> > that index. I hope I'm misreading this, because we may have uncovered
> > something broken.
>
> It looks like your take is right and that my first impression was
> wrong: if an expression has invalid data accumArrayResult() stores a
> NULL value, so we will never be short. So there is some data, and
> your patch could be tweaked so as an expression in a set is marked as
> invalid by storing a NULL in the result array at its location.

I'll switch to adding the nulls to the array result, and add tests for both
leading and trailing expr missing.

Mild change of subject, it seems that we can't get the expression fake
attnum context into the errors we re-throw in statatt_build_stavalues - it
might make sense to to bring a version of that function into
extended_stats_funcs where we can add the extra parameters for context (and
avoid the need for a text datum version of some longish strings that we've
already just converted from converting json-string to c-string. If I did
make a new function, then that'd be 2 statatt_* functions that no longer
need to be visible outside of attribute_stats.c. Thoughts on both making
the new function, and maybe sending a few of these statatts back to
static-land?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Manni Wood 2026-02-04 05:38:27 Re: Speed up COPY FROM text/CSV parsing using SIMD
Previous Message John Naylor 2026-02-04 05:13:35 Re: refactor architecture-specific popcount code