Re: progress report for ANALYZE

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp
Cc: alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, rjuju123(at)gmail(dot)com, anowocien(at)gmail(dot)com, robertmhaas(at)gmail(dot)com
Subject: Re: progress report for ANALYZE
Date: 2019-07-22 08:30:39
Message-ID: 20190722.173039.59246943.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..

At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> wrote in <0876b4fe-26fb-ca32-f179-c696fa3ddfec(at)nttcom(dot)co(dot)jp>
> >> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> >> fixed. :)
> >> ---------------------------------------------------------
> >
> > I have some comments.
> > + const int index[] = {
> > + PROGRESS_ANALYZE_PHASE,
> > + PROGRESS_ANALYZE_TOTAL_BLOCKS,
> > + PROGRESS_ANALYZE_BLOCKS_DONE
> > + };
> > + const int64 val[] = {
> > + PROGRESS_ANALYZE_PHASE_ANALYSIS,
> > + 0, 0
> > Do you oppose to leaving the total/done blocks alone here:p?
>
>
> Thanks for your comments!
> I created a new patch based on your comments because Alvaro allowed me
> to work on revising the patch. :)
>
>
> Ah, I revised it to remove "0, 0".

Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.

> > + BlockNumber nblocks;
> > + double blksdone = 0;
> > Why is it a double even though blksdone is of the same domain
> > with nblocks? And finally:
> > + pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> > + ++blksdone);
> > It is converted into int64.
>
>
> Fixed.
> But is it suitable to use BlockNumber instead int64?

Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.

> > + WHEN 2 THEN 'analyzing sample'
> > + WHEN 3 THEN 'analyzing sample (extended stats)'
> > I think we should avoid parenthesized phrases as far as
> > not-necessary. That makes the column unnecessarily long. The
> > phase is internally called as "compute stats" so *I* would prefer
> > something like the followings:
> > + WHEN 2 THEN 'computing stats'
> > + WHEN 3 THEN 'computing extended stats'
> > + WHEN 4 THEN 'analyzing complete'
> > And if you are intending by this that (as mentioned in the
> > documentation) "working to complete this analyze", this would be:
> > + WHEN 4 THEN 'completing analyze'
> > + WHEN 4 THEN 'finalizing analyze'
>
>
> I have no strong opinion, so I changed the phase-names based on
> your suggestions like following:
>
> WHEN 2 THEN 'computing stats'
> WHEN 3 THEN 'computing extended stats'
> WHEN 4 THEN 'finalizing analyze'
>
> However, I'd like to get any comments from hackers to get a consensus
> about the names.

Agreed. Especially such word choosing is not suitable for me..

> > + <entry>Process ID of backend.</entry>
> > of "the" backend. ? And period is not attached on all
> > descriptions consisting of a single sentence.
> >
> > + <entry>OID of the database to which this backend is
> > connected.</entry>
> > + <entry>Name of the database to which this backend is
> > connected.</entry>
> > "database to which .. is connected" is phrased as "database this
> > backend is connected to" in pg_stat_acttivity. (Just for consistency)
>
>
> I checked the sentences on other views of progress monitor (VACUUM,
> CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
> I'd like to keep it as is. :)

Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.

> > + <entry>Whether the current scan includes legacy inheritance
> > children.</entry>
> > This apparently excludes partition tables but actually it is
> > included.
> >
> > "Whether scanning through child tables" ?
> > I'm not sure "child tables" is established as the word to mean
> > both "partition tables and inheritance children"..
>
>
> Hmm... I'm also not sure but I fixed as you suggested.

Yeah, I also am not sure the suggestion is good enough as is..

> > + The table being scanned (differs from <literal>relid</literal>
> > + only when processing partitions or inheritance children).
> > Is <literal> needed? And I think the parentheses are not needed.
> > OID of the table currently being scanned. Can differ from relid
> > when analyzing tables that have child tables.
>
>
> How about:
> OID of the table currently being scanned.
> It might be different from relid when analyzing tables that have child
> tables.
>
>
>
> > + Total number of heap blocks to scan in the current table.
> > Number of heap blocks on scanning_table to scan?
> > It might be better that this description describes that this
> > and the next column is meaningful only while the phase
> > "scanning table".
>
>
> How about:
> Total number of heap blocks in the scanning_table.

(For me, ) that seems like it shows blocks including all
descendents for inheritance parent. But I'm not sure..a

> > + The command is currently scanning the table.
> > "sample(s)" comes somewhat abruptly in the next item. Something
> > like "The command is currently scanning the table
> > <structfield>scanning_table</structfield> to obtain samples"
> > might be better.
>
>
> Fixed.
>
>
> > + <command>ANALYZE</command> is currently extracting statistical data
> > + from the sample obtained.
> > Something like "The command is computing stats from the samples
> > obtained in the previous phase" might be better.
>
>
> Fixed.
>
>
> Please find attached patch. :)
>
> Here is a test result of the patch.
> ==========================================================
> # select * from pg_stat_progress_analyze ; \watch 0.0001;
>
> 9067|13599|postgres|16387|f|16387|scanning table|443|14
> 9067|13599|postgres|16387|f|16387|scanning table|443|44
> 9067|13599|postgres|16387|f|16387|scanning table|443|76
> 9067|13599|postgres|16387|f|16387|scanning table|443|100
> ...
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
> ==========================================================

Looks fine!

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-07-22 08:34:13 Re: POC: converting Lists into arrays
Previous Message Heikki Linnakangas 2019-07-22 08:30:37 Re: Memory Accounting