Re: Extended Statistics set/restore/clear functions.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-hackers(at)lists(dot)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Extended Statistics set/restore/clear functions.
Date: 2025-11-14 05:49:23
Message-ID: CADkLM=d_mmTn3+ChL-z_ZomwOKT-BssQU+=6_a8WDv6A4b7uAA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> extenssted statistics surely won't work on system columns,
> how should we deal with case like:
> ```
> {"attributes": [6, -1], "ndistinct": 14}
> {"attributes": [6, -7], "ndistinct": 14},
> ```
> issue a warning or error out saying that your attribute number is invalid?
> Should we discourage using system columns as examples in comments here?
>

Negative numbers represent the Nth expression defined in the extended
statistics object. So if you have extended statistics on a, b, length(a),
length(b) then you can legally have -1 and -2 in the attributes, but
nothing lower than that.
See functions pg_ndistinct_validate_items() and
pg_depdendencies_validate_deps() as these check the attributes in the value
against the definition of the extended stats object.

Though this does bring up a small naming problem: Elements of a
pg_dependencies are Dependency, abbreviated to dep, but are also called
Items like the elements in a pg_ndistinct. We should pick a standard name
for such things (probably item) and use it everywhere.

>
> I have added more test code in src/test/regress/sql/pg_ndistinct.sql,
> to improve the code coverage.
>

I'm trying to implement those test cases, but I may have missed some.

>
>
> this (and many other places) looks wrong, because
> ereturn would really return ``(Datum) 0``, and this function returns
> JsonParseErrorType.
> so we have to errsave here.
>

Good point. Implemented.

>
> NDistinctParseState.ndistinct should be integer,
> otherwise pg_ndistinct_out will not be consistent with pg_ndistinct_in?
>

+1

Implemented many, but not all of these suggestions.

Attachment Content-Type Size
v13-0001-Refactor-output-format-of-pg_ndistinct.patch text/x-patch 17.2 KB
v13-0002-Refactor-output-format-of-pg_dependencies.patch text/x-patch 12.1 KB
v13-0003-Add-working-input-function-for-pg_ndistinct.patch text/x-patch 29.3 KB
v13-0004-Add-working-input-function-for-pg_dependencies.patch text/x-patch 34.3 KB
v13-0005-Expose-attribute-statistics-functions-for-use-in.patch text/x-patch 10.7 KB
v13-0006-Add-extended-statistics-support-functions.patch text/x-patch 166.2 KB
v13-0007-Include-Extended-Statistics-in-pg_dump.patch text/x-patch 13.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-11-14 06:04:25 Re: pgsql: Drop unnamed portal immediately after execution to completion
Previous Message Japin Li 2025-11-14 05:49:07 Re: [PATCH] Add pg_get_database_ddl() function to reconstruct CREATE DATABASE statement