| From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add expressions to pg_restore_extended_stats() |
| Date: | 2026-01-30 08:07:30 |
| Message-ID: | CADkLM=fZci8mBEoKX9Gk-+0jXrVmQBuTErP-0JYgE1iUL+EX2A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Jan 30, 2026 at 12:55 AM Michael Paquier <michael(at)paquier(dot)xyz>
wrote:
> On Fri, Jan 30, 2026 at 12:08:49AM -0500, Corey Huinker wrote:
> > 3. Keeping the 2-D text array in #1, but each "row" is a list of
> > kwargs-like pairs like the arguments used in pg_restore_attribute_stats
> > (i.e. ARRAY['null_frac','0.5','avg_width','1.0','most_common_values',...]
> >
> > 4. JSON. The outer structure would be an array of objects, each object
> > would be a key-value.
>
> I'd still favor 4 on the ground that it's easier to edit and read,
> which would more in line with the MCV, dependencies, ndistinct and
> att/rel statistics. The format proposed in the attached patch is hard
> to work with, anyway. Now, I do take your point about composite
> record values casted into a single text value could be confusing
> (double-quote issues, I guess?), so perhaps a text[] as in 3 would be
> more adapted for readability.
Hmm, maybe it isn't so bad:
SELECT '{"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"}'::text[];
text
---------------------------------------------------
{"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"}
(1 row)
SELECT
array_to_json('{"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"}'::text[]);
array_to_json
---------------------------------------------------
["{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"]
(1 row)
Mind you, this is an ANYARRAY first casted to text, if we cast the
pg_stats_ext_exprs.most_common_values directly to JSON then it'll drill
down into the innards of the composite values because it can see the local
datatypes, and that breaks our ability to use regular input functions. I
learned that the hard way when using JSON for serializing attribute stats
stuff when this effort first began.
Before seeing that, I wanted to try option 3 first, as it brings clarity
with no real increase in tooling other than looking for repeated keywords,
but if you're hyped for json then I'll try that first.
> We could also force some checks based
> the order of the arguments in the input array, so as duplicates are
> not an issue, I guess?
>
If we're doing a kwargs-thing then I may as well just track which keywords
were already used. We already bail out on the whole expressions array at
the first sign of inconsistency, so it's not like we have to decide which
of the duplicates to keep.
Structurally, I feel that import_expressions() is overcomplicated, and
> with the correct structure tracking the state of each field, I would
> try to reduce a bit the duplication that the patch presents, aiming at
> less callers of statatt_build_stavalues() and statatt_set_slot(),
> perhaps.
>
I don't think we can get around those. It's a limitation of how the
sta(kindN/opN/collN/numbersN/valuesN) system in pg_statistic works. We want
to fill in each stakind as we find it, and we don't know how many of them
we've already filled out. An array of records would have been better, but
we've got 5 parallel arrays of scalars and we have to live with it.
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Masahiko Sawada | 2026-01-30 08:03:49 | Re: pg_upgrade: optimize replication slot caught-up check |