| From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me> |
| Subject: | Re: Add expressions to pg_restore_extended_stats() |
| Date: | 2026-02-06 04:50:20 |
| Message-ID: | CADkLM=fSvftbJ7fcGmokGGi_w2CbyvwDNg-3shtiZOsaqLamGA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
>
>
> v6-0001 has less fuzz, thanks for cleaning up the whole. I am looking
> at the patch, and immediately noted two concepts that bump into my eyes
> and look rather non-reliable.
>
> + if (!sta_ok)
> + *exprs_is_perfect = false;
> + isnull = false;
>
> This bit looks incorrect to me? If !sta_ok (or !row_is_perfect in
> import_pg_statistic()), then why is a non-NULL value inserted into the
> resulting array? If we fail to parse even one field or miss one key,
> we should force NULL for this single expression in the worst case and
> attempt to move on with the rest. But it does not matter anyway
> because import_expressions() would fail the import. Why don't we just
> skip the rest of the expressions then? We know that the resulting
> array will go to the garbage bit and that the restore failed, issuing
> a WARNING for the statext object.
>
This is where the metaphor between pg_statistic as attribute statistic and
pg_statistic as an array element in stxdexpr breaks down a bit.
For pg_restore_attribute_stats(), our goal was to create a new stat if none
exists, and if one exists then replace only those elements that are 1)
specified in the call and 2) successfully import. This means that the
function can update a pg_statistic row, but return false because not all
attributes specified were actually updates. i.e. it wasn't "perfect".
For pg_restore_extended_stats(), we've learned that we *must* have the
right number of elements in the pg_statistic array, so the notion of
replace-ability is severely if not fatally weakened. However, there are
errors that we might want an individual pg_statistic to recover from, a
good example being keys that somehow get removed in a future version. So
while I've modified import_pg_statistic to return a null datum on any
inconsistency, that might not be the case in the future, and
import_expressions() should check to see if it happens. Similarly,
import_expressions could have multiple pg_statistic rows, and if one of
them has an inconsistency, I'd still like the others to still make it in.
That's implemented by checking a counter of pg_statistics that imported ok
vs the total number of exprs, and if they match then import_expressions()
was "perfect".
>
> I think that import_expressions() has its logic going backwards, by
> this I mean that intializing exprs_is_perfect to true could be risky.
> It seems to me that we should do the exact opposite: initialize it at
> false, and switch to true *if and only if* all the correct conditions
> we want are reached. I'd suggest set of gotos and a single exit path
> at the end of import_expressions() where exprs_is_perfect is set to
> true to let the caller know that the expression can be safely used.
> Remember import_mcv(), as one example.
>
+1
> Similarly, the same concept should be applied to import_pg_statistic().
> row_is_perfect should be false to start, and switched to true once we
> are sure that everything we want is valid.
>
+1
> Also, I think that the format of the dump should be better when it
> comes to a set of expressions where some of them have invalid data.
> Even if the stats of an expression are invalid, pg_dump builds a JSON
> blob for the element of the expression with all the keys and their
> values set to NULL. I'd suggest to just publish a NULL for the
> element where invalid stats are found. That makes the dumps shorter
> as well.
>
Stats can't be invalid on the way out, only on the way in. I'm assuming
that you mean null/pointless data. I used jsonb_strip_nulls() to remove
keys where the value is null, and nullif() to map empty objects to null,
and I think that's much more tidy. We still could get a situation where all
the exprs are empty (i.e. [null,null,null]). There is no simple test to map
that to just a plain null, but even if there were the SQL is already
drifting into "clever" territory and I know that pg_dump likes to keep
things very simple.
| Attachment | Content-Type | Size |
|---|---|---|
| v7-0001-Add-support-for-exprs-in-pg_restore_extended_stat.patch | text/x-patch | 79.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2026-02-06 04:58:09 | Re: using index to speedup add not null constraints to a table |
| Previous Message | Yugo Nagata | 2026-02-06 04:37:35 | Re: Warn when creating or enabling a subscription with max_logical_replication_workers = 0 |