| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us |
| Subject: | Re: Extended Statistics set/restore/clear functions. |
| Date: | 2025-11-11 08:06:49 |
| Message-ID: | aRLumVDIQHUTKQYG@paquier.xyz |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Nov 10, 2025 at 12:33:40AM -0500, Corey Huinker wrote:
> It may not be quite what you wanted, but the attribute names are now static
> constants in the new adt c files. It's possible/probable that you wanted
> them in some header file, but so far I haven't had to create any new header
> files, but that can be done if desired.
No, that's not the best thing we can do with the dump/restore pieces
in mind. Let's put that in a separate header.
> That's done in the 0008-0009 patches. If I was starting from scratch, I
> would have moved the pre-existing in/out/send/recv functions to their own
> files in their own patches before changing the output format, but tacked on
> at the end like they are it's easier to see what the changes were, and the
> patches will probably get squashed together anyway.
Thanks for the new patch. And FWIW I disagree with this approach:
cleanup and refactoring pieces make more sense if done first, as these
lead to less code churn in the final result. So... I've begun to put
my hands on the patch set. The whole has been restructured a bit, as
per the attached. Patch 0001 to 0004 feel OK here, these include two
code moves and the two output functions:
- Two new files for adt/, that I'm planning to apply soon as a
separate cleanup.
- New output functions, with keys added to a new header named
statistics_format.h, for frontend and backend consumption.
Next comes the input functions. First, I am unhappy with the amount
of testing that has been put into ndistinct, first and only input
facility I've looked at in details for the moment. I have quickly
spotted a couple a few issues while testing buggy input, like this one
that crashes on pointer dereference, not good obviously:
SELECT '[]'::pg_ndistinct;
There was a second one with the error message generated when using an
incorrect key value.
Second, the inputs are too permissive and could be more strictly
checked IMHO. For example, patterns like that are incorrect, still
authorized with only the patches up to 0005 in:
- Duplicated list of attributes:
SELECT '[{"attributes" : [2,3], "ndistinct" : 4},
{"attributes" : [2,3], "ndistinct" : 4}]'::pg_ndistinct;
- Partial (K,N) sets, for example say we take stats on attrs (1,2,3),
a partial input like this one is basically OK:
SELECT '[{"attributes" : [1,3], "ndistinct" : 4},
{"attributes" : [1,2,3], "ndistinct" : 4}]'::pg_ndistinct;
These are checked in the patches that introduce the functions like
with pg_ndistinct_validate_items(), based on the list of stxkeys we
have. However, I think that this is not enough by itself. Shouldn't
we check that the list of items in the array is what we expect based
on the longest "attributes" array at least, even after a JSON that was
parsed? That would be cheap to check in the output function itself,
at least as a first layer of checks before trying something with the
import function and cross-checking the list of attributes for the
extended statistics object. This means checking that for N attributes
we have all the elements we'd expect in each element of the array,
without gaps or duplications, with an extra step done once the JSON
parsing is finished. Except for this sanity issue this part of the
patch set should be mostly OK, plus more cleanup and more typo/grammar
fixes.
I suspect a similar family of issues with pg_dependencies, and it
would be nice to move the tests with the input function into a new
regression file, like the other one.
I've rebased the full set using the new structure. 0001~0004 are
clean. 0005~ need more work and analysis, but that's a start.
--
Michael
| Attachment | Content-Type | Size |
|---|---|---|
| v10-0001-Make-pg_ndinstinct-a-proper-adt.patch | text/x-diff | 6.5 KB |
| v10-0002-Make-pg_dependencies-a-proper-adt.patch | text/x-diff | 7.7 KB |
| v10-0003-Refactor-output-format-of-pg_ndistinct.patch | text/x-diff | 17.0 KB |
| v10-0004-Refactor-output-format-of-pg_dependencies.patch | text/x-diff | 11.1 KB |
| v10-0005-Add-working-input-function-for-pg_ndistinct.patch | text/x-diff | 21.5 KB |
| v10-0006-Add-working-input-function-for-pg_dependencies.patch | text/x-diff | 17.4 KB |
| v10-0007-Expose-attribute-statistics-functions-for-use-in.patch | text/x-diff | 10.7 KB |
| v10-0008-Add-extended-statistics-support-functions.patch | text/x-diff | 113.2 KB |
| v10-0009-Include-Extended-Statistics-in-pg_dump.patch | text/x-diff | 13.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrei Lepikhov | 2025-11-11 08:59:40 | Re: Sequence Access Methods, round two |
| Previous Message | Peter Eisentraut | 2025-11-11 07:04:17 | Re: Reorganize GUC structs |