Re: progress report for ANALYZE

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: <alvherre(at)2ndquadrant(dot)com>, <pgsql-hackers(at)lists(dot)postgresql(dot)org>, <rjuju123(at)gmail(dot)com>, <anowocien(at)gmail(dot)com>, <robertmhaas(at)gmail(dot)com>
Subject: Re: progress report for ANALYZE
Date: 2019-07-23 04:51:04
Message-ID: 374df3ef-be5c-6b7b-1ad9-e09d41640e67@nttcom.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Horiguchi-san, Alvaro, Anthony, Julien and Robert,

On 2019/07/22 17:30, Kyotaro Horiguchi wrote:
> Hello.
>
> # It's very good timing, as you came in while I have a time after
> # finishing a quite nerve-wrackig task..
>
> At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> wrote in <0876b4fe-26fb-ca32-f179-c696fa3ddfec(at)nttcom(dot)co(dot)jp>
>>>> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
>>>> fixed. :)
>>>> ---------------------------------------------------------
>>>
>>> I have some comments.
>>> + const int index[] = {
>>> + PROGRESS_ANALYZE_PHASE,
>>> + PROGRESS_ANALYZE_TOTAL_BLOCKS,
>>> + PROGRESS_ANALYZE_BLOCKS_DONE
>>> + };
>>> + const int64 val[] = {
>>> + PROGRESS_ANALYZE_PHASE_ANALYSIS,
>>> + 0, 0
>>> Do you oppose to leaving the total/done blocks alone here:p?
>>
>>
>> Thanks for your comments!
>> I created a new patch based on your comments because Alvaro allowed me
>> to work on revising the patch. :)
>>
>>
>> Ah, I revised it to remove "0, 0".
>
> Thanks. For a second I thought that
> PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
> renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.

"PROGRESS_ANALYZE_PHASE_ANALYSIS" was replaced with
"PROGRESS_ANALYZE_PHASE_COMPUTING" because I changed
the phase-name on v3.patch like this:

./src/include/commands/progress.h

+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE 1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING 2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE 4

Is it Okay?


>>> + BlockNumber nblocks;
>>> + double blksdone = 0;
>>> Why is it a double even though blksdone is of the same domain
>>> with nblocks? And finally:
>>> + pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
>>> + ++blksdone);
>>> It is converted into int64.
>>
>>
>> Fixed.
>> But is it suitable to use BlockNumber instead int64?
>
> Yeah, I didn't meant that we should use int64 there. Sorry for
> the confusing comment. I meant that blksdone should be of
> BlockNumber.

Fixed. Thanks for the clarification. :)
Attached v4 patch file only includes this fix.

>>> + WHEN 2 THEN 'analyzing sample'
>>> + WHEN 3 THEN 'analyzing sample (extended stats)'
>>> I think we should avoid parenthesized phrases as far as
>>> not-necessary. That makes the column unnecessarily long. The
>>> phase is internally called as "compute stats" so *I* would prefer
>>> something like the followings:
>>> + WHEN 2 THEN 'computing stats'
>>> + WHEN 3 THEN 'computing extended stats'
>>> + WHEN 4 THEN 'analyzing complete'
>>> And if you are intending by this that (as mentioned in the
>>> documentation) "working to complete this analyze", this would be:
>>> + WHEN 4 THEN 'completing analyze'
>>> + WHEN 4 THEN 'finalizing analyze'
>>
>>
>> I have no strong opinion, so I changed the phase-names based on
>> your suggestions like following:
>>
>> WHEN 2 THEN 'computing stats'
>> WHEN 3 THEN 'computing extended stats'
>> WHEN 4 THEN 'finalizing analyze'
>>
>> However, I'd like to get any comments from hackers to get a consensus
>> about the names.
>
> Agreed. Especially such word choosing is not suitable for me..

To Alvaro, Julien, Anthony and Robert,
Do you have any ideas? :)


>>> + <entry>Process ID of backend.</entry>
>>> of "the" backend. ? And period is not attached on all
>>> descriptions consisting of a single sentence.
>>>
>>> + <entry>OID of the database to which this backend is
>>> connected.</entry>
>>> + <entry>Name of the database to which this backend is
>>> connected.</entry>
>>> "database to which .. is connected" is phrased as "database this
>>> backend is connected to" in pg_stat_acttivity. (Just for consistency)
>>
>>
>> I checked the sentences on other views of progress monitor (VACUUM,
>> CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
>> I'd like to keep it as is. :)
>
> Oh, I see from where the wordings came. But no periods seen after
> sentense when it is the only one in a description in other system
> views tables. I think the progress views tables should be
> corrected following convention.

Sounds reasonable. However, I'd like to create another patch after
this feature was committed since that document fix influence other
progress monitor's description on the document.


>>> + <entry>Whether the current scan includes legacy inheritance
>>> children.</entry>
>>> This apparently excludes partition tables but actually it is
>>> included.
>>>
>>> "Whether scanning through child tables" ?
>>> I'm not sure "child tables" is established as the word to mean
>>> both "partition tables and inheritance children"..
>>
>>
>> Hmm... I'm also not sure but I fixed as you suggested.
>
> Yeah, I also am not sure the suggestion is good enough as is..
>
>>> + Total number of heap blocks to scan in the current table.
>>> Number of heap blocks on scanning_table to scan?
>>> It might be better that this description describes that this
>>> and the next column is meaningful only while the phase
>>> "scanning table".
>>
>>
>> How about:
>> Total number of heap blocks in the scanning_table.
>
> (For me, ) that seems like it shows blocks including all
> descendents for inheritance parent. But I'm not sure..a

In the case of scanning_table is parent table, it doesn't
show the number. However, child tables shows the number.
I tested it using Declarative partitioning table, See the bottom
of this email. :)

>> Please find attached patch. :)
>>
>> Here is a test result of the patch.
>> ==========================================================
>> # select * from pg_stat_progress_analyze ; \watch 0.0001;
>>
>> 9067|13599|postgres|16387|f|16387|scanning table|443|14
>> ...
>> 9067|13599|postgres|16387|f|16387|scanning table|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
>> ==========================================================
>
> Looks fine!

I shared a test result using Declarative partitioning table.

==========================================================
## Create partitioning table
create table hoge as select a from generate_series(0, 40000) a;

create table hoge2(
a integer
) partition by range(a);

create table hoge2_10000 partition of hoge2
for values from (0) to (10000);

create table hoge2_20000 partition of hoge2
for values from (10000) to (20000);

create table hoge2_30000 partition of hoge2
for values from (20000) to (30000);

create table hoge2_default partition of hoge2 default;

## Test
select oid,relname,relpages,reltuples from pg_class where relname like 'hoge%';

oid | relname | relpages | reltuples
-------+---------------+----------+-----------
16538 | hoge | 177 | 40001
16541 | hoge2 | 0 | 0
16544 | hoge2_10000 | 45 | 10000
16547 | hoge2_20000 | 45 | 10000
16550 | hoge2_30000 | 45 | 10000
16553 | hoge2_default | 45 | 10001
(6 rows)

select * from pg_stat_progress_analyze ; \watch 0.00001;

27579|13599|postgres|16541|t|16544|scanning table|45|17
27579|13599|postgres|16541|t|16544|scanning table|45|38
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45

27579|13599|postgres|16541|t|16547|scanning table|45|17
27579|13599|postgres|16541|t|16547|scanning table|45|37
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45

27579|13599|postgres|16541|t|16550|scanning table|45|9
27579|13599|postgres|16541|t|16550|scanning table|45|30
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45

27579|13599|postgres|16541|t|16553|scanning table|45|5
27579|13599|postgres|16541|t|16553|scanning table|45|26
27579|13599|postgres|16541|t|16553|scanning table|45|42
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|finalizing analyze|45|45

27579|13599|postgres|16544|f|16544|scanning table|45|1
27579|13599|postgres|16544|f|16544|scanning table|45|30
27579|13599|postgres|16544|f|16544|computing stats|45|45

27579|13599|postgres|16547|f|16547|scanning table|45|25
27579|13599|postgres|16547|f|16547|computing stats|45|45

27579|13599|postgres|16550|f|16550|scanning table|45|11
27579|13599|postgres|16550|f|16550|scanning table|45|38
27579|13599|postgres|16550|f|16550|finalizing analyze|45|45

27579|13599|postgres|16553|f|16553|scanning table|45|25
27579|13599|postgres|16553|f|16553|computing stats|45|45
==========================================================

I'll share test result using partitioning table via
Inheritance tables on next email. :)

Thanks,
Tatsuro Yamada

Attachment Content-Type Size
v4-0001-Report-progress-for-ANALYZE.patch text/plain 14.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleksandr Shulgin 2019-07-23 05:30:10 Re: Queries on QMF to POSTGRE
Previous Message Thomas Munro 2019-07-23 04:50:40 Re: Introduce timeout capability for ConditionVariableSleep