Re: progress report for ANALYZE

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: <alvherre(at)2ndquadrant(dot)com>, <pgsql-hackers(at)lists(dot)postgresql(dot)org>, <rjuju123(at)gmail(dot)com>, <anowocien(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: progress report for ANALYZE
Date: 2019-07-22 06:02:16
Message-ID: 0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Horiguchi-san!

On 2019/07/11 19:56, Kyotaro Horiguchi wrote:
> Hello.
>
> At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> wrote in <244cb241-168b-d6a9-c45f-a80c34cdc6ad(at)nttcom(dot)co(dot)jp>
>> Hi Alvaro, Anthony, Julien and Robert,
>>
>> On 2019/07/09 3:47, Julien Rouhaud wrote:
>>> On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas(at)gmail(dot)com>
>>> wrote:
>>>>
>>>> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
>>>> <alvherre(at)2ndquadrant(dot)com> wrote:
>>>>> Yeah, I got the impression that that was determined to be the
>>>>> desirable
>>>>> behavior, so I made it do that, but I'm not really happy about it
>>>>> either. We're not too late to change the CREATE INDEX behavior, but
>>>>> let's discuss what is it that we want.
>>>>
>>>> I don't think I intended to make any such determination -- which
>>>> commit do you think established this as the canonical behavior?
>>>>
>>>> I propose that once a field is set, we should leave it set until the
>>>> end.
>>> +1
>>> Note that this patch is already behaving like that if the table only
>>> contains dead rows.
>
> +1 from me.
>
>> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
>> fixed. :)
>> ---------------------------------------------------------
>
> I have some comments.
>
> + const int index[] = {
> + PROGRESS_ANALYZE_PHASE,
> + PROGRESS_ANALYZE_TOTAL_BLOCKS,
> + PROGRESS_ANALYZE_BLOCKS_DONE
> + };
> + const int64 val[] = {
> + PROGRESS_ANALYZE_PHASE_ANALYSIS,
> + 0, 0
>
> Do you oppose to leaving the total/done blocks alone here:p?

Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)

Ah, I revised it to remove "0, 0".

> + BlockNumber nblocks;
> + double blksdone = 0;
>
> Why is it a double even though blksdone is of the same domain
> with nblocks? And finally:
>
> + pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> + ++blksdone);
>
> It is converted into int64.

Fixed.
But is it suitable to use BlockNumber instead int64?


> + WHEN 2 THEN 'analyzing sample'
> + WHEN 3 THEN 'analyzing sample (extended stats)'
>
> I think we should avoid parenthesized phrases as far as
> not-necessary. That makes the column unnecessarily long. The
> phase is internally called as "compute stats" so *I* would prefer
> something like the followings:
>
> + WHEN 2 THEN 'computing stats'
> + WHEN 3 THEN 'computing extended stats'
>
>
>
> + WHEN 4 THEN 'analyzing complete'
>
> And if you are intending by this that (as mentioned in the
> documentation) "working to complete this analyze", this would be:
>
> + WHEN 4 THEN 'completing analyze'
> + WHEN 4 THEN 'finalizing analyze'

I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

WHEN 2 THEN 'computing stats'
WHEN 3 THEN 'computing extended stats'
WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.

> + <entry>Process ID of backend.</entry>
>
> of "the" backend. ? And period is not attached on all
> descriptions consisting of a single sentence.
>
> + <entry>OID of the database to which this backend is connected.</entry>
> + <entry>Name of the database to which this backend is connected.</entry>
>
> "database to which .. is connected" is phrased as "database this
> backend is connected to" in pg_stat_acttivity. (Just for consistency)

I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)

> + <entry>Whether the current scan includes legacy inheritance children.</entry>
>
> This apparently excludes partition tables but actually it is
> included.
>
> "Whether scanning through child tables" ?
>
> I'm not sure "child tables" is established as the word to mean
> both "partition tables and inheritance children"..

Hmm... I'm also not sure but I fixed as you suggested.

> + The table being scanned (differs from <literal>relid</literal>
> + only when processing partitions or inheritance children).
>
> Is <literal> needed? And I think the parentheses are not needed.
>
> OID of the table currently being scanned. Can differ from relid
> when analyzing tables that have child tables.

How about:
OID of the table currently being scanned.
It might be different from relid when analyzing tables that have child tables.

> + Total number of heap blocks to scan in the current table.
>
> Number of heap blocks on scanning_table to scan?
>
> It might be better that this description describes that this
> and the next column is meaningful only while the phase
> "scanning table".

How about:
Total number of heap blocks in the scanning_table.

> + The command is currently scanning the table.
>
> "sample(s)" comes somewhat abruptly in the next item. Something
> like "The command is currently scanning the table
> <structfield>scanning_table</structfield> to obtain samples"
> might be better.

Fixed.


> + <command>ANALYZE</command> is currently extracting statistical data
> + from the sample obtained.
>
> Something like "The command is computing stats from the samples
> obtained in the previous phase" might be better.

Fixed.

Please find attached patch. :)

Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;

9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================

Thanks,
Tatsuro Yamada

Attachment Content-Type Size
v3-0001-Report-progress-for-ANALYZE.patch text/plain 14.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-07-22 07:24:51 Re: Tid scan improvements
Previous Message David Rowley 2019-07-22 05:44:54 Re: Index Skip Scan