Re: list of extended statistics on psql

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: list of extended statistics on psql
Date: 2021-01-19 00:44:31
Message-ID: 4948ebb7-0992-8df2-2b14-f84472b46cc5@nttcom.co.jp_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

>>> As for how to deal with this, I can think of about three ways:
>>>
>>> 1) simplify the command to only print information from pg_statistic_ext (so on information about which stats are built or sizes)
>>>
>>> 2) extend pg_stats_ext with necessary information (e.g. sizes)
>>>
>>> 3) create a new system view, with necessary information (so that pg_stats_ext does not need to be modified)
>>>
>>> 4) add functions returning the necessary information, possibly only for statistics the user can access (similarly to what pg_stats_ext does)
>>>
>>> Options 2-4 have the obvious disadvantage that this won't work on older releases (we can't add views or functions there). So I'm leaning towards #1 even if that means we have to remove some of the details. We can consider adding that for new releases, though.
>>
>>
>> Thanks for the useful advice. I go with option 1).
>> The following query is created by using pg_stats_ext instead of pg_statistic_ext and pg_statistic_ext_data. However, I was confused
>> about writing a part of the query for calculating MCV size because
>> there are four columns related to MCV. For example, most_common_vals, most_common_val_nulls, most_common_freqs, and most_common_base_freqs.
>> Currently, I don't know how to calculate the size of MCV by using the
>> four columns. Thoughts? :-)
>
> Well, my suggestion was to use pg_statistic_ext, because that lists all statistics, while pg_stats_ext is filtering statistics depending on access privileges. I think that's more appropriate for \dX, the contents should not change depending on the user.
>
> Also, let me clarify - with option (1) we'd not show the sizes at all. The size of the formatted statistics may be very different from the on-disk representation, so I see no point in showing it in \dX.
>
> We might show other stats (e.g. number of MCV items, or the fraction of data represented by the MCV list), but the user can inspect pg_stats_ext if needed.
>
> What we might do is to show those stats when a superuser is running this command, but I'm not sure that's a good idea (or how difficult would it be to implement).

Thanks for clarifying.
I see that your suggestion was to use pg_statistic_ext, not pg_stats_ext.
And we don't need the size of stats.

If that's the case, we also can't get the status of stats since PG12 or later
because we can't use pg_statistic_ext_data, as you know. Therefore, it would be
better to replace the query with the old query that I sent five months ago like this:

# the old query
SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
stxrelid::pg_catalog.regclass AS "Table",
stxname AS "Name",
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
a.attnum = s.attnum AND NOT attisdropped)) AS "Columns",
'd' = any(stxkind) AS "Ndistinct",
'f' = any(stxkind) AS "Dependencies",
'm' = any(stxkind) AS "MCV"
FROM pg_catalog.pg_statistic_ext stat
ORDER BY 1,2;

Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+--------+------------+---------+-----------+--------------+-----
public | hoge1 | hoge1_ext | a, b | t | t | t
public | hoge_t | hoge_t_ext | a, b | t | t | t
(2 rows)

The above query is so simple so that we would better to use the following query:

# This query works on PG10 or later
SELECT
es.stxnamespace::pg_catalog.regnamespace::text AS "Schema",
es.stxname AS "Name",
pg_catalog.format('%s FROM %s',
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ')
FROM pg_catalog.unnest(es.stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a
ON (es.stxrelid = a.attrelid
AND a.attnum = s.attnum
AND NOT a.attisdropped)),
es.stxrelid::regclass) AS "Definition",
CASE WHEN 'd' = any(es.stxkind) THEN 'defined'
END AS "Ndistinct",
CASE WHEN 'f' = any(es.stxkind) THEN 'defined'
END AS "Dependencies",
CASE WHEN 'm' = any(es.stxkind) THEN 'defined'
END AS "MCV"
FROM pg_catalog.pg_statistic_ext es
ORDER BY 1, 2;

Schema | Name | Definition | Ndistinct | Dependencies | Dependencies
--------+------------+------------------+-----------+--------------+--------------
public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined
public | hoge_t_ext | a, b FROM hoge_t | defined | defined | defined
(2 rows)

I'm going to create the WIP patch to use the above queriy.
Any comments welcome. :-D

Thanks,
Tatsuro Yamada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2021-01-19 00:53:41 RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message Tom Lane 2021-01-19 00:29:11 Re: [PATCH 1/1] Initial mach based shared memory support.