| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Extended Statistics set/restore/clear functions. |
| Date: | 2026-01-19 02:01:28 |
| Message-ID: | 65D559AA-6A5E-4695-B9DE-9188E6E74CF9@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Jan 17, 2026, at 08:13, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Jan 16, 2026 at 03:50:50PM +0900, Michael Paquier wrote:
>> Note that the restore function needs tests for the inherited record
>> case, and that we had the same error as the clear function when
>> grabbing the argument value for inherited. I have fixed the bug, did
>> not include any tests yet. Another thing that I find lacking is
>> coverage for the many paths that refer to incorrect input formats. A
>> lot of them are not covered at all, which is not good. I did not look
>> at the pg_dump portion yet.
>
> Here's a v27, with pg_dump/pg_upgrade fixed to handle the new version
> of the restore function with the table name and its schema specified
> in input arguments.
> --
> Michael
> <v27-0001-Add-pg_restore_extended_stats.patch><v27-0002-Include-Extended-Statistics-in-pg_dump.patch>
Comments for v27:
1 - 0001 - mcv.c
```
+ else
+ {
+ char *s = TextDatumGetCString(mcv_elems[index]);
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
+
+ if (!InputFunctionCallSafe(&finfo, s, ioparam, atttypmods[j],
+ (Node *) &escontext, &item->values[j]))
+ {
+ ereport(elevel,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("MCV elemement \"%s\" does not match expected input type.", s)));
+ return (Datum) 0;
+ }
+
+ pfree(s);
+ }
```
* I think we should also pfree(s) before ereport. From extended_statistics_update(), this function is called with elevel=WARNING, this ereport will not immediately fail out, so that s is leaked.
* We should also free mcvlist before the return, as well as item->values and item->isnull.
* There is a typo in the error message. elemement -> element.
2 - 0001 - mcv.c
```
+ for (int i = 0; i < numattrs; i++)
+ {
+ pfree(vatuples[i]);
```
vatuples[I] is returned by SearchSysCacheCopy1(), I believe it should be free by heap_freetuple(). See the header comment of SearchSysCacheCopy().
3 - 0001 - mcv.c
```
+/*
+ * The MCV is an array of records, but this is expected as 4 separate arrays.
+ * It is not possible to have a generic input function for pg_mcv_list
+ * because the most_common_values is a composite type with element types
+ * defined by the specific statistics object.
+ */
+Datum
+import_mcvlist(HeapTuple tup, int elevel, int numattrs, Oid *atttypids,
```
I just feel the head comment needs some enhancement, because it doesn’t explain what this function does, what are input and output. The current comment seems only to only explain why this function exists.
4 - 0001- extended_stats_funcs.c
```
} StakindFlags;
```
Just feel “k” should be captain: StaKindFlags.
5 - 0001 - extended_stats_funcs.c
```
+ nspoid = get_namespace_oid(nspname, true);
+ if (nspoid == InvalidOid)
+ {
+ ereport(WARNING,
+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("could not find schema \"%s\"", stxname));
+ PG_RETURN_BOOL(false);
+ }
```
Should “stxname” be “nspname”?
6 - 0001 - extended_stats_funcs.c
extended_statistics_update is defined to return a bool, but there are multiple branches returning Datum: "PG_RETURN_BOOL(false)"; "return (Datum) 0”.
7 - 0001 - extended_stats_funcs.c
```
+static Datum
+warn_type_mismatch(Datum d, const char *argname)
+{
+ char *s = TextDatumGetCString(d);
+
+ ereport(WARNING,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not match expression %s element \"%s\" with input type",
+ argname, s));
+ return (Datum) 0;
+}
```
Why this function needs to return a Datum?
8 - 0001 - extended_stats_funcs.c
```
+static void
+expand_stxkind(HeapTuple tup, StakindFlags *enabled)
```
This function only partially assign individual fields of “enabled” without zero it out, so it relies on the caller to zero out “enabled” before calling this function. Now extended_statistics_update() has done a memset(), but I just feel it would be more robust if this function does the memset.
Also, this function silently discard unknown staKind, should we at least Assert(false) against unknown staKind?
9 - 0001 - extended_stats_funcs.c
```
+ if (!ok)
+ {
+ char *s = TextDatumGetCString(exprs_elems[correlation_idx]);
+
+ ereport(WARNING,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not match expression %s of element \"%s\" with input type",
+ extexprarginfo[CORRELATION_ELEM].argname, s));
+ return (Datum) 0;
+ }
```
Do we need to pfree(s) before return?
10 - 0001 - extended_stats_funcs.c
```
+ pgstup = heap_form_tuple(RelationGetDescr(pgsd), values, nulls);
+ pgstdat = heap_copy_tuple_as_datum(pgstup, RelationGetDescr(pgsd));
+ astate = accumArrayResult(astate, pgstdat, false, pgstypoid,
+ CurrentMemoryContext);
```
I think we should free pgstup because this is a loop.
11 - 0001
```
+<programlisting>
+ SELECT pg_restore_extended_stats(
+ 'schemaname', 'tab_schema'::name,
+ 'relname', 'tab_name'::name,
+ 'statistics_schemaname', 'stats_schema'::name,
+ 'statistics_name', 'stats_name'::name,
+ 'inherited', false,
+ 'n_distinct', '{"2, 3": 4, "2, -1": 4, "2, -2": 4, "3, -1": 4, "3, -2": 4}'::pg_ndistinct,
+ 'dependencies', '{"2 => 1": 1.000000, "2 => -1": 1.000000, "2 => -2": 1.000000}'::pg_dependencies
+ 'exprs', '{{0,4,-0.75,"{1}","{0.5}","{-1,0}",-0.6,NULL,NULL,NULL},{0.25,4,-0.5,"{2}","{0.5}",NULL,1,NULL,NULL,NULL}}'::text[]);
+</programlisting>
```
Missing a tailing comma in the “dependencies” line.
12 - 0001
```
+ Additionally, this function accepts argument name
+ <literal>version</literal> of type <type>integer</type>, which
+ specifies the server version from which the statistics originated.
```
I don’t see where “version” is implemented.
13 - 0002
```
+ else
+ appendPQExpBufferStr(pq,
+ "FROM ( "
+ "SELECT s.stxndistinct AS n_distinct, "
+ " s.stxdependencies AS dependencies "
+ "FROM pg_catalog.pg_statistics_ext AS s "
+ "JOIN pg_catalog.pg_namespace AS n "
+ "ON n.oid = s.stxnamespace "
+ "WHERE n.nspname = $1 "
+ "AND e.stxname = $2 "
+ ") AS e ");
```
Looks like e.stxname should be s.stxname.
Also, pg_catalog.pg_statistics_ext should be pg_catalog.pg_statistic_ext.
14 - 0002
```
+ /*
+ * Set up query for constraint-specific details.
+ *
+ * 19+: query pg_stats_ext and pg_stats_ext_exprs as-is 15-18: query
+ * pg_stats_ext translating the ndistinct and dependencies, 14:
+ * inherited is always NULL 12-13: no pg_stats_ext_exprs 10-11: no
+ * pg_stats_ext, join pg_statistic_ext and pg_namespace
+ */
+
+ appendPQExpBufferStr(pq,
+ "PREPARE getExtStatsStats(pg_catalog.name, pg_catalog.name) AS\n"
+ "SELECT “);
```
"constraint-specific details” looks like a copy-paste error.
15 - 0002
```
+ * The ndistinnct and depdendencies formats changed in v19, so
```
Typo: ndistinnct -> ndistinct, depdendencies -> dependencies
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Smith | 2026-01-19 02:27:15 | Re: Proposal: Conflict log history table for Logical Replication |
| Previous Message | Michael Paquier | 2026-01-19 01:19:17 | Re: Startup PANIC on standby promotion due to zero-filled WAL segment |