Re: list of extended statistics on psql

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: list of extended statistics on psql
Date: 2020-09-02 23:45:17
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> >> I wonder if trying to list info about all stats from the statistics
> >> object in a single line is necessary. Maybe we should split the info
> >> into one line per statistics, so for example
> >>
> >>     CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ...
> >>
> >> would result in three lines in the \dX output. The statistics name would
> >> identify which lines belong together, but other than that the pieces are
> >> mostly independent.
> >
> >Yeah, that's what I'm suggesting.  I don't think we need to repeat the
> >name/definition for each line though.
> >
> >It might be useful to know how does pspg show a single entry that's
> >split in three lines, though.
> >
> Ah, I didn't realize you're proposing that - I assumed it's broken
> simply to make it readable, or something like that. I think the lines
> are mostly independent, so I'd suggest to include the name of the object
> on each line. The question is whether this independence will remain true
> in the future - for example histograms would be built only on data not
> represented by the MCV list, so there's a close dependency there.
> Not sure about pspg, and I'm not sure it matters too much.
> pspg almost ignores multiline rows - the horizontal cursor is one row every time. There is only one use case where pspg detects multiline rows - sorts, and pspg ensures correct content for multiline rows displayed in different (than input) order.

I try to summarize the discussion so far.
Is my understanding right? Could you revise it if it has something wrong?

* Summary

1. "\dX[+]" doesn't display the Size of extended stats since the size is
useful only for the development process of the stats.

2. each row should have stats name, definition, type, and status.
For example:

statname | definition | type |
someobj | (a, b) FROM tab | n-distinct: built |
someobj | (a, b) FROM tab | func-dependencies: built |
someobj | (a, b) FROM tab | mcv: built |
sttshoge | (a, b) FROM hoge | n-distinct: required |
sttshoge | (a, b) FROM hoge | func-dependencies:required|
sttscross| (a, b) FROM t1,t2| n-distinct: required |

My opinion is below:

For 1., Agreed. I will remove it on the next patch.
For 2., I feel the design is not beautiful so I'd like to change it.
The reasons are:

- I think that even if we expected the number of types increasing two times,
each type would be better to put as columns, not lines.
Repeating items (the stats name and definition) should be removed.
It's okay there are many columns in the future like "\df+" because we can
use "\x" mode to display if we need it.

- The type column has two kinds of data, the one is stats type and another
is status. We know the word "One fact in One place" for data modeling in
the RDBMS world so it would be better to divide it.
I'd like to suggest the bellow design of the view.

statname | definition | n-distinct | func-dependencies | mcv |
someobj | (a, b) FROM tab | built | built | built |
sttshoge | (a, b) FROM hoge | required | required | |
sttscross| (a, b) FROM t1,t2| required | | |

Any thoughts?

Tatsuro Yamada

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-09-03 00:09:29 Re: Two fsync related performance issues?
Previous Message Thomas Munro 2020-09-02 23:30:50 Re: Two fsync related performance issues?