Re: list of extended statistics on psql

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: list of extended statistics on psql
Date: 2020-08-30 23:56:52
Message-ID: 2a58a2c3-3897-afe9-7c5e-3b2e91cdb021@nttcom.co.jp_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alvaro,

>>> IMO the per-type columns should show both the type being enabled as
>>> well as it being built.
>>
>> Hmm. I'm not sure how to get the status (enabled or disabled) of
>> extended stats. :(
>> Could you explain it more?
>
> pg_statistic_ext_data.stxdndistinct is not null if the stats have been
> built. (I'm not sure whether there's an easier way to determine this.)

Ah.. I see! Thank you.

> I suggest to do this
>
> Name | Schema | Definition | Ndistinct | Dependencies | MCV
> -----------+--------+--------------------------+-----------+--------------+-----
> stts_1 | public | (a, b) FROM t1 | f | t | f
>
>> I suppose that the current column order is sufficient if there is
>> no improvement of extended stats on PG14. Do you know any plan to
>> improve extended stats such as to allow it to cross multiple tables on PG14?
>
> I suggest that changing it in the future is going to be an uphill
> battle, so better get it right from the get go, without requiring a
> future restructure.

I understand your suggestions. I'll replace "Columns" and "Table" columns with "Definition" column.

>> Currently, I use this query to get Extended stats info from pg_statistic_ext.
>
> Maybe something like this would do
>
> SELECT
> stxnamespace::pg_catalog.regnamespace AS "Schema",
> stxname AS "Name",
> format('%s FROM %s',
> (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
> FROM pg_catalog.unnest(stxkeys) s(attnum)
> JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
> a.attnum = s.attnum AND NOT attisdropped)),
> stxrelid::regclass) AS "Definition",
> CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS "n-distinct",
> CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS "functional dependencies",
> CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv
> FROM pg_catalog.pg_statistic_ext es
> INNER JOIN pg_catalog.pg_class c
> ON stxrelid = c.oid
> LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
> ORDER BY 1, 2, 3;

Great! It helped me a lot to understand your suggestions correctly. Thanks. :-D
I got the below results by your query.

========
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

insert into t1 select i,i from generate_series(1,100) i;
analyze t1;

Your query gave this result:

Schema | Name | Definition | n-distinct | functional dependencies | mcv
--------+-----------+----------------------------+--------------------+-------------------------+--------------------
public | stts_1 | a, b FROM t1 | | built |
public | stts_2 | a, b FROM t1 | built | built |
public | stts_3 | a, b FROM t1 | built | built | built
public | stts_4 | b, c FROM t2 | enabled, not built | enabled, not built | enabled, not built
public | stts_hoge | col1, col2, col3 FROM hoge | enabled, not built | enabled, not built | enabled, not built
(5 rows)
========

I guess "enabled, not built" is a little redundant. The status would better to
have three patterns: "built", "not built" or nothing (NULL) like these:

- "built": extended stats is defined and built (collected by analyze cmd)
- "not built": extended stats is defined but have not built yet
- nothing (NULL): extended stats is not defined

What do you think about it?

I will send a new patch including :

- Replace "Columns" and "Table" column with "Definition"
- Show the status (built/not built/null) of extended stats by using
pg_statistic_ext_data

Thanks,
Tatsuro Yamada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-08-31 00:23:53 Re: Get rid of runtime handling of AlternativeSubPlan?
Previous Message Stephen Frost 2020-08-30 23:20:01 Re: New default role- 'pg_read_all_data'