Re: progress report for ANALYZE

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: progress report for ANALYZE
Date: 2019-12-06 06:23:58
Message-ID: 717424c6-0e8c-5995-d96b-a8bcfbbb0336@nttcom.co.jp_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit-san,

>>> I wonder two things below. What do you think?
>>>
>>> 1)
>>> For now, I'm not sure it should be set current_child_table_relid to zero
>>> when the current phase is changed from "acquiring inherited sample rows" to
>>> "computing stats". See <Test result> bellow.
>>
>> In the upthread discussion [1], Robert asked to *not* do such things,
>> that is, resetting some values due to phase change.  I'm not sure his
>> point applies to this case too though.
>
> Yeah, I understood.
> I'll check target relid of "computing stats" to re-read a code of
> analyze command later. :)

Finally, I understood after investigation of the code. :)
Call stack is the following, and analyze_rel() calls "N + 1" times
for partitioned table and each partitions.

analyze_rel start
do_analyze_rel inh==true start
onerel: hoge2
acq_inh_sample_rows start
childrel: hoge2_10000
childrel: hoge2_20000
childrel: hoge2_30000
childrel: hoge2_default
acq_inh_sample_rows end
compute_stats start
compute_stats end
compute_index_stats start
compute_index_stats end
finalizing start
finalizing end
do_analyze_rel inh==true end
analyze_rel end
...

Also, I checked my test result. ("//" is my comments)

# select oid,relname,relkind from pg_class where relname like 'hoge2%';
oid | relname | relkind
-------+---------------+---------
36081 | hoge2 | p
36084 | hoge2_10000 | r
36087 | hoge2_20000 | r
36090 | hoge2_30000 | r
36093 | hoge2_default | r
(6 rows)

# select relid,
current_child_table_relid,
phase,
sample_blks_total,
sample_blks_scanned,
ext_stats_total,
ext_stats_computed,
child_tables_total,
child_tables_done
from pg_stat_progress_analyze; \watch 0.00001

== for partitioned table hoge2 ==
//hoge2_10000
36081|36084|acquiring inherited sample rows|45|20|0|0|4|0
36081|36084|acquiring inherited sample rows|45|42|0|0|4|0
36081|36084|acquiring inherited sample rows|45|45|0|0|4|0
36081|36084|acquiring inherited sample rows|45|45|0|0|4|0

//hoge2_20000
36081|36087|acquiring inherited sample rows|45|3|0|0|4|1
36081|36087|acquiring inherited sample rows|45|31|0|0|4|1
36081|36087|acquiring inherited sample rows|45|45|0|0|4|1
36081|36087|acquiring inherited sample rows|45|45|0|0|4|1

//hoge2_30000
36081|36090|acquiring inherited sample rows|45|12|0|0|4|2
36081|36090|acquiring inherited sample rows|45|35|0|0|4|2
36081|36090|acquiring inherited sample rows|45|45|0|0|4|2
36081|36090|acquiring inherited sample rows|45|45|0|0|4|2

//hoge2_default
36081|36093|acquiring inherited sample rows|45|18|0|0|4|3
36081|36093|acquiring inherited sample rows|45|38|0|0|4|3
36081|36093|acquiring inherited sample rows|45|45|0|0|4|3
36081|36093|acquiring inherited sample rows|45|45|0|0|4|3

//Below "computing stats" is for the partitioned table hoge,
//therefore the second column from the left side would be
//better to set Zero to easy to understand.
//I guessd that user think which relid is the target of
//"computing stats"?!
//Of course, other option is to write it on document.

36081|36093|computing stats |45|45|0|0|4|4
36081|36093|computing stats |45|45|0|0|4|4
36081|36093|computing stats |45|45|0|0|4|4
36081|36093|computing stats |45|45|0|0|4|4
36081|36093|finalizing analyze |45|45|0|0|4|4

== for each partitions such as hoge2_10000 ... hoge2_default ==

//hoge2_10000
36084|0|acquiring sample rows |45|25|0|0|0|0
36084|0|computing stats |45|45|0|0|0|0
36084|0|computing extended stats|45|45|0|0|0|0
36084|0|finalizing analyze |45|45|0|0|0|0

//hoge2_20000
36087|0|acquiring sample rows |45|14|0|0|0|0
36087|0|computing stats |45|45|0|0|0|0
36087|0|computing extended stats|45|45|0|0|0|0
36087|0|finalizing analyze |45|45|0|0|0|0

//hoge2_30000
36090|0|acquiring sample rows |45|12|0|0|0|0
36090|0|acquiring sample rows |45|44|0|0|0|0
36090|0|computing extended stats|45|45|0|0|0|0
36090|0|finalizing analyze |45|45|0|0|0|0

//hoge2_default
36093|0|acquiring sample rows |45|10|0|0|0|0
36093|0|acquiring sample rows |45|43|0|0|0|0
36093|0|computing extended stats|45|45|0|0|0|0
36093|0|finalizing analyze |45|45|0|0|0|0

>>> 2)
>>> There are many "finalizing analyze" phases based on relids in the case
>>> of partitioning tables. Would it better to fix the document? or it
>>> would be better to reduce it to one?
>>>
>>> <Document>
>>> ---------------------------------------------------------
>>>        <entry><literal>finalizing analyze</literal></entry>
>>>        <entry>
>>>          The command is updating pg_class. When this phase is completed,
>>>          <command>ANALYZE</command> will end.
>>> ---------------------------------------------------------
>>
>> When a partitioned table is analyzed, its partitions are analyzed too.
>> So, the ANALYZE command effectively runs N + 1 times if there are N
>> partitions -- first analyze partitioned table to collect "inherited"
>> statistics by collecting row samples using
>> acquire_inherited_sample_rows(), then each partition to collect its
>> own statistics.  Note that this recursive application to ANALYZE to
>> partitions (child tables) only occurs for partitioned tables, not for
>> legacy inheritance.
>
> Thanks for your explanation.
> I understand Analyzing Partitioned table a little.

It would be better to modify the document of "finalizing analyze" phase.

# Before modify
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end.

# Modified
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end. In the case of partitioned table,
it might be shown on each partitions.

What do you think that? I'm going to fix it, if you agreed. :)

Thanks,
Tatsuro Yamada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2019-12-06 06:34:01 Windows UTF-8, non-ICU collation trouble
Previous Message Michael Paquier 2019-12-06 06:22:42 Re: Removal of support for OpenSSL 0.9.8 and 1.0.0