Re: progress report for ANALYZE

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: progress report for ANALYZE
Date: 2019-12-09 08:51:24
Message-ID: CA+HiwqFMJfOy4S2GGO==UC-BB5PFNJ_qRnKV+HmBKy0PDtnQ5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yamada-san,

On Fri, Dec 6, 2019 at 3:24 PM Tatsuro Yamada
<tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> wrote:
>>> 1)
> >>> For now, I'm not sure it should be set current_child_table_relid to zero
> >>> when the current phase is changed from "acquiring inherited sample rows" to
> >>> "computing stats". See <Test result> bellow.
> >>
> >> In the upthread discussion [1], Robert asked to *not* do such things,
> >> that is, resetting some values due to phase change. I'm not sure his
> >> point applies to this case too though.
>
> //Below "computing stats" is for the partitioned table hoge,
> //therefore the second column from the left side would be
> //better to set Zero to easy to understand.

On second thought, maybe we should not reset, because that might be
considered loss of information. To avoid confusion, we should simply
document that current_child_table_relid is only valid during the
"acquiring inherited sample rows" phase.

> >>> 2)
> >>> There are many "finalizing analyze" phases based on relids in the case
> >>> of partitioning tables. Would it better to fix the document? or it
> >>> would be better to reduce it to one?
> >>>
> It would be better to modify the document of "finalizing analyze" phase.
>
> # Before modify
> The command is updating pg_class. When this phase is completed,
> <command>ANALYZE</command> will end.
>
> # Modified
> The command is updating pg_class. When this phase is completed,
> <command>ANALYZE</command> will end. In the case of partitioned table,
> it might be shown on each partitions.
>
> What do you think that? I'm going to fix it, if you agreed. :)

*All* phases are repeated in this case, not not just "finalizing
analyze", because ANALYZE repeatedly runs for each partition after the
parent partitioned table's ANALYZE finishes. ANALYZE's documentation
mentions that analyzing a partitioned table also analyzes all of its
partitions, so users should expect to see the progress information for
each partition. So, I don't think we need to clarify that if only in
one phase's description. Maybe we can add a note after the phase
description table which mentions this implementation detail about
partitioned tables. Like this:

<note>
<para>
Note that when <command>ANALYZE</command> is run on a partitioned table,
all of its partitions are also recursively analyzed as also mentioned on
<xref linkend="sql-analyze"/>. In that case, <command>ANALYZE</command>
progress is reported first for the parent table, whereby its inheritance
statistics are collected, followed by that for each partition.
</para>
</note>

Some more comments on the documentation:

+ Number of computed extended stats. This counter only advances
when the phase
+ is <literal>computing extended stats</literal>.

Number of computer extended stats -> Number of extended stats computed

+ Number of analyzed child tables. This counter only advances
when the phase
+ is <literal>computing extended stats</literal>.

Regarding, "Number of analyzed child table", note that we don't
"analyze" child tables in this phase, only scan its blocks to collect
samples for parent's ANALYZE. Also, the 2nd sentence is wrong -- you
meant "when the phase is <literal>acquiring inherited sample
rows</literal>. I suggest to write this as follows:

Number of child tables scanned. This counter only advances when the phase
is <literal>acquiring inherited sample rows</literal>.

+ <entry>OID of the child table currently being scanned.
+ It might be different from relid when analyzing tables that
have child tables.

I suggest:

OID of the child table currently being scanned. This field is only valid when
the phase is <literal>computing extended stats</literal>.

+ The command is currently scanning the
<structfield>current_relid</structfield>
+ to obtain samples.

I suggest:

The command is currently scanning the the table given by
<structfield>current_relid</structfield> to obtain sample rows.

+ The command is currently scanning the
<structfield>current_child_table_relid</structfield>
+ to obtain samples.

I suggest (based on phase description pg_stat_progress_create_index
phase descriptions):

The command is currently scanning child tables to obtain sample rows. Columns
<structfield>child_tables_total</structfield>,
<structfield>child_tables_done</structfield>, and
<structfield>current_child_table_relid</structfield> contain the progress
information for this phase.

+ <row>
+ <entry><literal>computing stats</literal></entry>

I think the phase name should really be "computing statistics", that
is, use the full word.

+ <entry>
+ The command is computing stats from the samples obtained
during the table scan.
+ </entry>
+ </row>

So I suggest:

The command is computing statistics from the sample rows obtained during
the table scan

+ <entry><literal>computing extended stats</literal></entry>
+ <entry>
+ The command is computing extended stats from the samples
obtained in the previous phase.
+ </entry>

I suggest:

The command is computing extended statistics from the sample rows obtained
during the table scan.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message PikachuEXE 2019-12-09 08:57:34 Re: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -h localhost --jobs=2 crashes
Previous Message Dilip Kumar 2019-12-09 07:56:59 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions