| From: | Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> | 
|---|---|
| To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> | 
| Cc: | <alvherre(at)2ndquadrant(dot)com>, <pgsql-hackers(at)lists(dot)postgresql(dot)org>, <rjuju123(at)gmail(dot)com>, <anowocien(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Subject: | Re: progress report for ANALYZE | 
| Date: | 2019-07-22 06:02:16 | 
| Message-ID: | 0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi Horiguchi-san!
On 2019/07/11 19:56, Kyotaro Horiguchi wrote:
> Hello.
> 
> At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> wrote in <244cb241-168b-d6a9-c45f-a80c34cdc6ad(at)nttcom(dot)co(dot)jp>
>> Hi Alvaro, Anthony, Julien and Robert,
>>
>> On 2019/07/09 3:47, Julien Rouhaud wrote:
>>> On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas(at)gmail(dot)com>
>>> wrote:
>>>>
>>>> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
>>>> <alvherre(at)2ndquadrant(dot)com> wrote:
>>>>> Yeah, I got the impression that that was determined to be the
>>>>> desirable
>>>>> behavior, so I made it do that, but I'm not really happy about it
>>>>> either.  We're not too late to change the CREATE INDEX behavior, but
>>>>> let's discuss what is it that we want.
>>>>
>>>> I don't think I intended to make any such determination -- which
>>>> commit do you think established this as the canonical behavior?
>>>>
>>>> I propose that once a field is set, we should leave it set until the
>>>> end.
>>> +1
>>> Note that this patch is already behaving like that if the table only
>>> contains dead rows.
> 
> +1 from me.
> 
>> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
>> fixed. :)
>> ---------------------------------------------------------
>
> I have some comments.
> 
> +		const int   index[] = {
> +			PROGRESS_ANALYZE_PHASE,
> +			PROGRESS_ANALYZE_TOTAL_BLOCKS,
> +			PROGRESS_ANALYZE_BLOCKS_DONE
> +		};
> +		const int64 val[] = {
> +			PROGRESS_ANALYZE_PHASE_ANALYSIS,
> +			0, 0
> 
> Do you oppose to leaving the total/done blocks alone here:p?
Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)
Ah, I revised it to remove "0, 0".
> +  BlockNumber  nblocks;
> +  double    blksdone = 0;
> 
> Why is it a double even though blksdone is of the same domain
> with nblocks? And finally:
> 
> +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> +                   ++blksdone);
> 
> It is converted into int64.
Fixed.
But is it suitable to use BlockNumber instead int64?
  
> +                      WHEN 2 THEN 'analyzing sample'
> +                      WHEN 3 THEN 'analyzing sample (extended stats)'
> 
> I think we should avoid parenthesized phrases as far as
> not-necessary. That makes the column unnecessarily long. The
> phase is internally called as "compute stats" so *I* would prefer
> something like the followings:
> 
> +                      WHEN 2 THEN 'computing stats'
> +                      WHEN 3 THEN 'computing extended stats'
> 
> 
> 
> +                      WHEN 4 THEN 'analyzing complete'
> 
> And if you are intending by this that (as mentioned in the
> documentation) "working to complete this analyze", this would be:
> 
> +                      WHEN 4 THEN 'completing analyze'
> +                      WHEN 4 THEN 'finalizing analyze'
I have no strong opinion, so I changed the phase-names based on
your suggestions like following:
   WHEN 2 THEN 'computing stats'
   WHEN 3 THEN 'computing extended stats'
   WHEN 4 THEN 'finalizing analyze'
However, I'd like to get any comments from hackers to get a consensus
about the names.
> +     <entry>Process ID of backend.</entry>
> 
> of "the" backend. ? And period is not attached on all
> descriptions consisting of a single sentence.
>
> +      <entry>OID of the database to which this backend is connected.</entry>
> +      <entry>Name of the database to which this backend is connected.</entry>
> 
> "database to which .. is connected" is phrased as "database this
> backend is connected to" in pg_stat_acttivity. (Just for consistency)
I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)
> +      <entry>Whether the current scan includes legacy inheritance children.</entry>
> 
> This apparently excludes partition tables but actually it is
> included.
>
>    "Whether scanning through child tables" ?
> 
> I'm not sure "child tables" is established as the word to mean
> both "partition tables and inheritance children"..
Hmm... I'm also not sure but I fixed as you suggested.
> +       The table being scanned (differs from <literal>relid</literal>
> +       only when processing partitions or inheritance children).
> 
> Is <literal> needed? And I think the parentheses are not needed.
> 
>    OID of the table currently being scanned. Can differ from relid
>    when analyzing tables that have child tables.
How about:
   OID of the table currently being scanned.
   It might be different from relid when analyzing tables that have child tables.
> +       Total number of heap blocks to scan in the current table.
> 
>     Number of heap blocks on scanning_table to scan?
> 
> It might be better that this description describes that this
> and the next column is meaningful only while the phase
> "scanning table".
How about:
   Total number of heap blocks in the scanning_table.
> +       The command is currently scanning the table.
> 
> "sample(s)" comes somewhat abruptly in the next item. Something
> like "The command is currently scanning the table
> <structfield>scanning_table</structfield> to obtain samples"
> might be better.
Fixed.
  
> +       <command>ANALYZE</command> is currently extracting statistical data
> +       from the sample obtained.
> 
> Something like "The command is computing stats from the samples
> obtained in the previous phase" might be better.
Fixed.
Please find attached patch. :)
Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;
9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================
Thanks,
Tatsuro Yamada
| Attachment | Content-Type | Size | 
|---|---|---|
| v3-0001-Report-progress-for-ANALYZE.patch | text/plain | 14.9 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Rowley | 2019-07-22 07:24:51 | Re: Tid scan improvements | 
| Previous Message | David Rowley | 2019-07-22 05:44:54 | Re: Index Skip Scan |