| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add expressions to pg_restore_extended_stats() |
| Date: | 2026-02-02 07:11:55 |
| Message-ID: | aYBOO3TWaTaINX9G@paquier.xyz |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sun, Feb 01, 2026 at 07:39:04PM -0500, Corey Huinker wrote:
> The downside is that it results in 200 more lines of code in
> extended_stats_funcs.c and 500 more lines of test output in
> stats_import.sql.
>
> There are a few reasons for this:
>
> - It's up to us to validate the [{expr},{expr},...] structure ourselves,
> something a 2-D text array did rather succinctly.
> - null is a valid jsonb value, and we have to check for it. This quirk
> alone results in 4 more test cases, though some other test cases did go
> away.
> - json strings are unterminated c strings, so we have to convert them first
> to c strings and then to text datums to leverage the existing attribute
> stats functions
This one would not actually exist if using a 1-D text[] with elements
made of key/value/key/value as the array deconstruction would take
care of that, wouldn't it?
> There are probably some simplifications that can happen:
>
> - casting all stat values to text to avoid having to deal with the
> neither-fish-nor-fowl json numeric values.
Hmm. We'd still need to deal with buggy inputs even if this
conversion is enforced in pg_dump.
> - utility function for converting jbvString values to TextDatum.
> - utility function to simplify json iteration in a simple key/value object
> - deciding that unknown parameters aren't worth a warning
Do you actually need check_valid_expr_argnames() at all? At the end
of import_pg_statistic() we know that all the parameters have been
found. So we could just check on the numbers of values fetched and
complain based on NUM_ATTRIBUTE_STATS_ELEMS? That would remove some
code, not impact the safety. key_lookup_notnull() could be changed to
return three possible states rather than a boolean: okay and NULL
value, okay and not-NULL value, error if key not found in JSON blob or
incorrect parsing of the input value (aka infinite, input function
error, etc). Then increment the counter if one of the first two states
is found as NULL can be valid for some of the key/value pairs.
bv_get_float4_datum() and jbv_get_int4_datum() are alike, they could
be refactored as a single function that takes a type name in put for
the error message and an input function pointer. That would cut some
code.
> So...does this look better? Is it worth it?
Well, it does to me. The pg_dump bits based on jsonb_build_object()
are rather attractive in shape. This addresses the concerns I had
about item ordering and unicity of the keys in a natural way through
the data type filtering.
I don't see a need for key_lookup(), its sole caller is
key_lookup_notnull().
+ 'exprs', '{{0,4,-0.75,"{1}","{0.5}","{-1,0}",-0.6,NULL,NULL,NULL},{0.25,4,-0.5,"{2}","{0.5}",NULL,1,NULL,NULL,NULL}}'::text[]);
This is in the documentation, and would need fixing.
+WARNING: invalid expression elements most_common_vals and most_common_freqs: conflicting NULL/NOT NULL
For this warning in the regression tests, you should only need one
element, reducing the number of input lines?
The patch has a query checking the contents of pg_stats_ext_exprs for
test_mr_stat, no imports. This links to the multirange case mentioned
in import_pg_statistic().
Something that looks missing here is that we do not attach to which
negative attribute an item of "exprs" links to. It seems to me that
this needs to include an extra key called "attribute" with a negative
associated with it, so as we can rely on the attnum instead of the
order of the items restored.
+ * Create a pg_statisitc tuple from an expression container.
Typo here.
+ * The attytypids, attytypmods, and atttypcols arrays have all the
Incorrect variable names.
+ errmsg("malformed expressions \"%s\": must be a regular numeric",
errmsg not project-style.
+ errmsg("malformed expressions \"%s\": exprected array of %d, found %d",
Bis repetita, and s/exprected/expected/.
+ ereport(WARNING,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot specify parameter \"%s\".",
+ extarginfo[EXPRESSIONS_ARG].argname),
+ errhint("Extended statistics object "
+ "does not support statistics of this type."));
errmsg on a line line, object name to be included in the errhint. No
dot at the end of errmsg.
+ errmsg("invalid expression element \"%s\": must be type string"
This one also should be reworded.
--
Michael
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-02-02 07:15:52 | Re: walsender: Assert MyReplicationSlot is set before use |
| Previous Message | Peter Smith | 2026-02-02 07:04:01 | use the malloc macros in pg_dump.c |