Re: progress report for ANALYZE

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp
Cc: alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: progress report for ANALYZE
Date: 2019-07-11 10:56:10
Message-ID: 20190711.195610.175868438.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> wrote in <244cb241-168b-d6a9-c45f-a80c34cdc6ad(at)nttcom(dot)co(dot)jp>
> Hi Alvaro, Anthony, Julien and Robert,
>
> On 2019/07/09 3:47, Julien Rouhaud wrote:
> > On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas(at)gmail(dot)com>
> > wrote:
> >>
> >> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
> >> <alvherre(at)2ndquadrant(dot)com> wrote:
> >>> Yeah, I got the impression that that was determined to be the
> >>> desirable
> >>> behavior, so I made it do that, but I'm not really happy about it
> >>> either. We're not too late to change the CREATE INDEX behavior, but
> >>> let's discuss what is it that we want.
> >>
> >> I don't think I intended to make any such determination -- which
> >> commit do you think established this as the canonical behavior?
> >>
> >> I propose that once a field is set, we should leave it set until the
> >> end.
> > +1
> > Note that this patch is already behaving like that if the table only
> > contains dead rows.

+1 from me.

> I fixed the patch including:
>
> - Replace "if" to "else if". (Suggested by Julien)
> - Fix typo s/ech/each/. (Suggested by Anthony)
> - Add Phase "analyzing complete" in the pgstat view. (Suggested by
> - Julien, Robert and me)
> It was overlooked to add it in system_views.sql.
>
> I share my re-test result, see below:
>
> ---------------------------------------------------------
> [Session #1]
> create table hoge as select * from generate_series(1, 1000000) a;
> analyze verbose hoge;
>
> [Session #2]
> \a \t
> select * from pg_stat_progress_analyze; \watch 0.001
>
> 3785|13599|postgres|16384|f|16384|scanning table|4425|6
> 3785|13599|postgres|16384|f|16384|scanning table|4425|31
> 3785|13599|postgres|16384|f|16384|scanning table|4425|70
> 3785|13599|postgres|16384|f|16384|scanning table|4425|109
> ...
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|analyzing sample|0|0
> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> fixed. :)
> ---------------------------------------------------------

I have some comments.

+ const int index[] = {
+ PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_BLOCKS_DONE
+ };
+ const int64 val[] = {
+ PROGRESS_ANALYZE_PHASE_ANALYSIS,
+ 0, 0

Do you oppose to leaving the total/done blocks alone here:p?

+ BlockNumber nblocks;
+ double blksdone = 0;

Why is it a double even though blksdone is of the same domain
with nblocks? And finally:

+ pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+ ++blksdone);

It is converted into int64.

+ WHEN 2 THEN 'analyzing sample'
+ WHEN 3 THEN 'analyzing sample (extended stats)'

I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:

+ WHEN 2 THEN 'computing stats'
+ WHEN 3 THEN 'computing extended stats'

+ WHEN 4 THEN 'analyzing complete'

And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:

+ WHEN 4 THEN 'completing analyze'
+ WHEN 4 THEN 'finalizing analyze'

+ <entry>Process ID of backend.</entry>

of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.

+ <entry>OID of the database to which this backend is connected.</entry>
+ <entry>Name of the database to which this backend is connected.</entry>

"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)

+ <entry>Whether the current scan includes legacy inheritance children.</entry>

This apparently excludes partition tables but actually it is
included.

"Whether scanning through child tables" ?

I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..

+ The table being scanned (differs from <literal>relid</literal>
+ only when processing partitions or inheritance children).

Is <literal> needed? And I think the parentheses are not needed.

OID of the table currently being scanned. Can differ from relid
when analyzing tables that have child tables.

+ Total number of heap blocks to scan in the current table.

Number of heap blocks on scanning_table to scan?

It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".

+ The command is currently scanning the table.

"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.

+ <command>ANALYZE</command> is currently extracting statistical data
+ from the sample obtained.

Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.

regards.

-
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2019-07-11 11:12:36 Re: base backup client as auxiliary backend process
Previous Message Richard Guo 2019-07-11 10:27:46 Re: [HACKERS] WIP: Aggregation push-down