| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Extended Statistics set/restore/clear functions. |
| Date: | 2026-01-19 06:51:28 |
| Message-ID: | aW3UcP6rGXMLjkwm@paquier.xyz |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Jan 19, 2026 at 03:11:03PM +0900, Michael Paquier wrote:
> (Still looking at the patch in more details now, putting my hands on
> it..)
Attached is a v28, with a set of comments that I am letting up to
the author to address.
The patch set has a clear deficiency in test coverage: a lot of code
where we expect failures are totally uncovered, particularly things
related to MCV lists and parameters, MCV array checks, dimensions,
incorrect number of elements, expression imports, etc. There is a
large number of them. I'd expect all of these cases to have proper
coverage, with WARNINGs generated as we don't want hard failures on
these calls, do we? Some more validation for ndistinct and
dependencies would be welcome, as well.
Not surprising, but the patch is weak under trash inputs. I've
quickly come up with a test case that crashes the import, extracted
from a different case (see the POOHHA, that's the problematic value):
SELECT pg_catalog.pg_restore_extended_stats(
'schemaname', 'stats_import',
'relname', 'test_clone',
'statistics_schemaname', 'stats_import',
'statistics_name', 'test_stat_clone',
'inherited', false,
'most_common_vals', '{{four,NULL,0,NULL},{one,"(POOHHA,1.1,ONE,01-01-2001,\"{\"\"xkey\"\": \"\"xval\"\"}\")",1,2},{tre,"(3,3.3,TRE,03-03-2003,)",-1,3},{two,"(2,2.2,TWO,02-02-2002,\"[true, 4, \"\"six\"\"]\")",1,2}}'::text[],
'most_common_val_nulls', '{{f,t,f,t},{f,f,f,f},{f,f,f,f},{f,f,f,f}}'::boolean[],
'most_common_freqs', '{0.25,0.25,0.25,0.25}'::double precision[],
'most_common_base_freqs', '{0.00390625,0.015625,0.00390625,0.015625}'::double precision[]);
This has to be stable, meaning that we *must* be able to handle and
reject trash input data, and also relates to the lack of coverage
overall. I am pretty sure that if I begin to inject more buggy values
in other areas of these parameters, things could get funky.
I am wondering if we should not cut the pie into two parts here: the
dependencies and ndistinct data is actually so much more
straight-forward as they have predictive input patterns that we can
discard easiy if they are incorrect. The MCV list part with its
import logic and cross-validation of the inputs is something else
based on the attribute types. I don't think that we are far from a
good solution, still that would be one strategy to get something done
if we cannot get the MCV part completely right.
A lot of the code is still largely under-documented, with zero
explanation about pg_restore_extended_stats(), no explanation about
the individual fields that can be set in pg_restore_extended_stats().
It's impossible to understand for a reader what statext_mcv_import()
does based on its inputs and what a caller can give in input.
There is also zero documentation about the handling of the MCV lists,
especially the expectations around nulls, [base] frequences, common
values, aka all the most_common_* parameters. I do feel that we
should have some documentation about the other stxkinds as well, at
least some description of the parameter names, what they can do, or at
least what they refer to in the catalogs. Having this stuff implied
in the code with zero reference in the documentation is like driving a
car blind, and I doubt it's good for one's health.
My modifications have been added as of 0003, for inclusion with the
rest.
--
Michael
| Attachment | Content-Type | Size |
|---|---|---|
| v28-0001-Add-pg_restore_extended_stats.patch | text/x-diff | 100.7 KB |
| v28-0002-Include-Extended-Statistics-in-pg_dump.patch | text/x-diff | 14.0 KB |
| v28-0003-Several-fixes-and-adjustments.patch | text/x-diff | 15.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Smith | 2026-01-19 07:02:33 | Re: Proposal: Conflict log history table for Logical Replication |
| Previous Message | Tatsuo Ishii | 2026-01-19 06:45:10 | Re: Row pattern recognition |