| From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add expressions to pg_restore_extended_stats() |
| Date: | 2026-02-02 00:39:04 |
| Message-ID: | CADkLM=du+OcctrsTk+hZryUGy=0OnPep-3LdGzut1nqF391+Eg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Jan 30, 2026 at 3:07 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
wrote:
> On Fri, Jan 30, 2026 at 12:55 AM Michael Paquier <michael(at)paquier(dot)xyz>
> wrote:
>
>> On Fri, Jan 30, 2026 at 12:08:49AM -0500, Corey Huinker wrote:
>> > 3. Keeping the 2-D text array in #1, but each "row" is a list of
>> > kwargs-like pairs like the arguments used in pg_restore_attribute_stats
>> > (i.e.
>> ARRAY['null_frac','0.5','avg_width','1.0','most_common_values',...]
>> >
>> > 4. JSON. The outer structure would be an array of objects, each object
>> > would be a key-value.
>>
>> I'd still favor 4 on the ground that it's easier to edit and read,
>> which would more in line with the MCV, dependencies, ndistinct and
>> att/rel statistics. The format proposed in the attached patch is hard
>> to work with, anyway. Now, I do take your point about composite
>> record values casted into a single text value could be confusing
>> (double-quote issues, I guess?), so perhaps a text[] as in 3 would be
>> more adapted for readability.
>
>
> Hmm, maybe it isn't so bad:
>
> SELECT '{"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"}'::text[];
> text
> ---------------------------------------------------
> {"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"}
> (1 row)
>
> SELECT
> array_to_json('{"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"}'::text[]);
> array_to_json
> ---------------------------------------------------
> ["{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"]
> (1 row)
>
> Mind you, this is an ANYARRAY first casted to text, if we cast the
> pg_stats_ext_exprs.most_common_values directly to JSON then it'll drill
> down into the innards of the composite values because it can see the local
> datatypes, and that breaks our ability to use regular input functions. I
> learned that the hard way when using JSON for serializing attribute stats
> stuff when this effort first began.
>
> Before seeing that, I wanted to try option 3 first, as it brings clarity
> with no real increase in tooling other than looking for repeated keywords,
> but if you're hyped for json then I'll try that first.
>
>
>> We could also force some checks based
>> the order of the arguments in the input array, so as duplicates are
>> not an issue, I guess?
>>
>
> If we're doing a kwargs-thing then I may as well just track which keywords
> were already used. We already bail out on the whole expressions array at
> the first sign of inconsistency, so it's not like we have to decide which
> of the duplicates to keep.
>
> Structurally, I feel that import_expressions() is overcomplicated, and
>> with the correct structure tracking the state of each field, I would
>> try to reduce a bit the duplication that the patch presents, aiming at
>> less callers of statatt_build_stavalues() and statatt_set_slot(),
>> perhaps.
>>
>
> I don't think we can get around those. It's a limitation of how the
> sta(kindN/opN/collN/numbersN/valuesN) system in pg_statistic works. We want
> to fill in each stakind as we find it, and we don't know how many of them
> we've already filled out. An array of records would have been better, but
> we've got 5 parallel arrays of scalars and we have to live with it.
>
Ok, so I refactored the exprs input to be a jsonb array, and the good news
is that it results in legitmately more readable test cases. See below
-- ok, with warning: extra exprs param
SELECT pg_catalog.pg_restore_extended_stats(
'schemaname', 'stats_import',
'relname', 'test',
'statistics_schemaname', 'stats_import',
'statistics_name', 'test_stat_mcelem',
'inherited', false,
'exprs', '[
{
"avg_width": 33,
"null_frac": 0,
"n_distinct": -1,
"correlation": 1,
"histogram_bounds":
"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}",
"most_common_vals": null,
"most_common_elems": "{-1,0,1,2,3}",
"most_common_freqs": null,
"elem_count_histogram":
"{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1.5}",
"most_common_elem_freqs":
"{0.25,0.25,0.5,0.25,0.25,0.25,0.5,0.25}",
"bad_param": "text no one will ever parse"
}
]'::jsonb);
WARNING: malformed expr expression: "bad_param": is not an expression key
name, value ignored
The downside is that it results in 200 more lines of code in
extended_stats_funcs.c and 500 more lines of test output in
stats_import.sql.
There are a few reasons for this:
- It's up to us to validate the [{expr},{expr},...] structure ourselves,
something a 2-D text array did rather succinctly.
- null is a valid jsonb value, and we have to check for it. This quirk
alone results in 4 more test cases, though some other test cases did go
away.
- json strings are unterminated c strings, so we have to convert them first
to c strings and then to text datums to leverage the existing attribute
stats functions
I briefly thought about making the exprs input parameter jsonb[] thus
saving some structure checks, but what little that buys us is then taken
away by making the pg_dump command that much more complex.
There are probably some simplifications that can happen:
- casting all stat values to text to avoid having to deal with the
neither-fish-nor-fowl json numeric values.
- utility function for converting jbvString values to TextDatum.
- utility function to simplify json iteration in a simple key/value object
- deciding that unknown parameters aren't worth a warning
So...does this look better? Is it worth it?
| Attachment | Content-Type | Size |
|---|---|---|
| v2jsonb-0001-Add-support-for-exprs-in-pg_restore_extended.patch | text/x-patch | 94.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Álvaro Herrera | 2026-02-02 00:42:48 | relkind as an enum |
| Previous Message | Michael Paquier | 2026-02-02 00:12:30 | Re: [PATCH] Fix error message in RemoveWalSummaryIfOlderThan to indicate file removal failure |