Re: progress report for ANALYZE

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: progress report for ANALYZE
Date: 2019-12-18 05:44:04
Message-ID: a0f1615b-7b5d-6759-62be-453ef0cb3686@nttcom.co.jp_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit-san,

> >>> 1)
>>>>> For now, I'm not sure it should be set current_child_table_relid to zero
>>>>> when the current phase is changed from "acquiring inherited sample rows" to
>>>>> "computing stats". See <Test result> bellow.
>>>>
>>>> In the upthread discussion [1], Robert asked to *not* do such things,
>>>> that is, resetting some values due to phase change. I'm not sure his
>>>> point applies to this case too though.
>>
>> //Below "computing stats" is for the partitioned table hoge,
>> //therefore the second column from the left side would be
>> //better to set Zero to easy to understand.
>
> On second thought, maybe we should not reset, because that might be
> considered loss of information. To avoid confusion, we should simply
> document that current_child_table_relid is only valid during the
> "acquiring inherited sample rows" phase.

Okay, agreed. :)


>>>>> 2)
>>>>> There are many "finalizing analyze" phases based on relids in the case
>>>>> of partitioning tables. Would it better to fix the document? or it
>>>>> would be better to reduce it to one?
>>>>>
>> It would be better to modify the document of "finalizing analyze" phase.
>>
>> # Before modify
>> The command is updating pg_class. When this phase is completed,
>> <command>ANALYZE</command> will end.
>>
>> # Modified
>> The command is updating pg_class. When this phase is completed,
>> <command>ANALYZE</command> will end. In the case of partitioned table,
>> it might be shown on each partitions.
>>
>> What do you think that? I'm going to fix it, if you agreed. :)
>
> *All* phases are repeated in this case, not not just "finalizing
> analyze", because ANALYZE repeatedly runs for each partition after the
> parent partitioned table's ANALYZE finishes. ANALYZE's documentation
> mentions that analyzing a partitioned table also analyzes all of its
> partitions, so users should expect to see the progress information for
> each partition. So, I don't think we need to clarify that if only in
> one phase's description. Maybe we can add a note after the phase
> description table which mentions this implementation detail about
> partitioned tables. Like this:
>
> <note>
> <para>
> Note that when <command>ANALYZE</command> is run on a partitioned table,
> all of its partitions are also recursively analyzed as also mentioned on
> <xref linkend="sql-analyze"/>. In that case, <command>ANALYZE</command>
> progress is reported first for the parent table, whereby its inheritance
> statistics are collected, followed by that for each partition.
> </para>
> </note>

Ah.. you are right: All phases are repeated, it shouldn't be fixed
the only one phase's description.

> Some more comments on the documentation:
>
> + Number of computed extended stats. This counter only advances
> when the phase
> + is <literal>computing extended stats</literal>.
>
> Number of computed extended stats -> Number of extended stats computed

Will fix.


> + Number of analyzed child tables. This counter only advances
> when the phase
> + is <literal>computing extended stats</literal>.
>
> Regarding, "Number of analyzed child table", note that we don't
> "analyze" child tables in this phase, only scan its blocks to collect
> samples for parent's ANALYZE. Also, the 2nd sentence is wrong -- you
> meant "when the phase is <literal>acquiring inherited sample
> rows</literal>. I suggest to write this as follows:
>
> Number of child tables scanned. This counter only advances when the phase
> is <literal>acquiring inherited sample rows</literal>.

Oops, I will fix it. :)


> + <entry>OID of the child table currently being scanned.
> + It might be different from relid when analyzing tables that
> have child tables.
>
> I suggest:
>
> OID of the child table currently being scanned. This field is only valid when
> the phase is <literal>computing extended stats</literal>.

Will fix.

> + The command is currently scanning the
> <structfield>current_relid</structfield>
> + to obtain samples.
>
> I suggest:
>
> The command is currently scanning the the table given by
> <structfield>current_relid</structfield> to obtain sample rows.

Will fix.


> + The command is currently scanning the
> <structfield>current_child_table_relid</structfield>
> + to obtain samples.
>
> I suggest (based on phase description pg_stat_progress_create_index
> phase descriptions):
>
> The command is currently scanning child tables to obtain sample rows. Columns
> <structfield>child_tables_total</structfield>,
> <structfield>child_tables_done</structfield>, and
> <structfield>current_child_table_relid</structfield> contain the progress
> information for this phase.

Will fix.

> + <row>
> + <entry><literal>computing stats</literal></entry>
>
> I think the phase name should really be "computing statistics", that
> is, use the full word.

Will fix.


> + <entry>
> + The command is computing stats from the samples obtained
> during the table scan.
> + </entry>
> + </row>
>
> So I suggest:
>
> The command is computing statistics from the sample rows obtained during
> the table scan

Will fix.


> + <entry><literal>computing extended stats</literal></entry>
> + <entry>
> + The command is computing extended stats from the samples
> obtained in the previous phase.
> + </entry>
>
> I suggest:
>
> The command is computing extended statistics from the sample rows obtained
> during the table scan.

Will fix.

Thanks for your above useful suggestions. It helps me a lot. :)

Cheers!
Tatsuro Yamada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-12-18 06:03:31 Re: [HACKERS] Block level parallel vacuum
Previous Message Fujii Masao 2019-12-18 05:10:04 Re: archive status ".ready" files may be created too early