Re: progress report for ANALYZE

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>
Subject: Re: progress report for ANALYZE
Date: 2019-08-13 05:33:34
Message-ID: 73116c0a-bdc7-7c1d-b8da-a1e6040d015a@nttcom.co.jp_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert and All!

On 2019/08/02 2:48, Robert Haas wrote:> On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
>> <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> wrote:
>>> Attached v4 patch file only includes this fix.
>>
>> I've moved this to the September CF, where it is in "Needs review" state.
>
> /me reviews.

Thanks for your comments! :)

> + <entry><structfield>scanning_table</structfield></entry>
>
> I think this should be retitled to something that ends in 'relid',
> like all of the corresponding cases in existing progress views.
> Perhaps 'active_relid' or 'current_relid'.

Fixed.
I changed "scanning_table" to "current_relid" for analyze in monitoring.sgml.
However, I didn't change "relid" on other places for other commands because
I'd like to create other patch for that later. :)

> + The command is computing extended stats from the samples
> obtained in the previous phase.
>
> I think you should change this (and the previous one) to say "from the
> samples obtained during the table scan."

Fixed.

> + Total number of heap blocks in the scanning_table.
>
> Perhaps I'm confused, but it looks to me like what you are advertising
> is the number of blocks that will be sampled, not the total number of
> blocks in the table. I think that's the right thing to advertise, but
> then the column should be named and documented that way.

Ah, you are right. Fixed.
I used the following sentence based on Vinayak's patch created two years a go.

- <entry><structfield>heap_blks_total</structfield></entry>
- <entry><type>bigint</type></entry>
- <entry>
- Total number of heap blocks in the current_relid.
- </entry>

+ <entry><structfield>sample_blks_total</structfield></entry>
+ <entry><type>bigint</></entry>
+ <entry>
+ Total number of heap blocks that will be sampled.
+</entry>

> + {
> + const int index[] = {
> + PROGRESS_ANALYZE_TOTAL_BLOCKS,
> + PROGRESS_ANALYZE_SCANREL
> + };
> + const int64 val[] = {
> + nblocks,
> + RelationGetRelid(onerel)
> + };
> +
> + pgstat_progress_update_multi_param(2, index, val);
> + }
>
> This block seems to be introduced just so you can declare variables; I
> don't think that's good style. It's arguably unnecessary because we
> now are selectively allowing variable declarations within functions,
> but I think you should just move the first array to the top of the
> function and the second declaration to the top of the function
> dropping const, and then just do val[0] = nblocks and val[1] =
> RelationGetRelid(onerel). Maybe you can also come up with better
> names than 'index' and 'val'. Same comment applies to another place
> where you have something similar.

I agreed and fixed.

> Patch seems to need minor rebasing.
>
> Maybe "scanning table" should be renamed "acquiring sample rows," to
> match the names used in the code?

I fixed as following:

s/PROGRESS_ANALYZE_PHASE_SCAN_TABLE/
PROGRESS_ANALYZE_PHASE_ACQUIRING_SAMPLE_ROWS/

s/WHEN 1 THEN 'scanning table'::text/
WHEN 1 THEN 'acquiring sample rows'::text/

> I'm not a fan of the way you set the scan-table phase and inh flag in
> one place, and then slightly later set the relation OID and block
> count. That creates a race during which users could see the first bit
> of data set and the second not set. I don't see any reason not to set
> all four fields together.

Hmm... I understand but it's little difficult because if there are
child rels, acquire_inherited_sample_rows() calls acquire_sample_rows()
(See below). So, it is able to set all four fields together if inh flag
is given as a parameter of those functions, I suppose. But I'm not sure
whether it's okay to add the parameter to both functions or not.
Do you have any ideas? :)

# do_analyze_rel()
...
if (inh)
numrows = acquire_inherited_sample_rows(onerel, elevel,
rows, targrows,
&totalrows, &totaldeadrows);
else
numrows = (*acquirefunc) (onerel, elevel,
rows, targrows,
&totalrows, &totaldeadrows);

# acquire_inherited_sample_rows()
...
foreach(lc, tableOIDs)
{
...
/* Check table type (MATVIEW can't happen, but might as well allow) */
if (childrel->rd_rel->relkind == RELKIND_RELATION ||
childrel->rd_rel->relkind == RELKIND_MATVIEW)
{
/* Regular table, so use the regular row acquisition function */
acquirefunc = acquire_sample_rows;
...
/* OK, we'll process this child */
has_child = true;
rels[nrels] = childrel;
acquirefuncs[nrels] = acquirefunc;
...
}
...
for (i = 0; i < nrels; i++)
{
...
AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
...
if (childtargrows > 0)
{
...
/* Fetch a random sample of the child's rows */
childrows = (*acquirefunc) (childrel, elevel,
rows + numrows, childtargrows,
&trows, &tdrows)

> Please be sure to make the names of the constants you use match up
> with the external names as far as it reasonably makes sense, e.g.
>
> +#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE 1
> +#define PROGRESS_ANALYZE_PHASE_COMPUTING 2
> +#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
> +#define PROGRESS_ANALYZE_PHASE_FINALIZE 4
>
> vs.
>
> + WHEN 0 THEN 'initializing'::text
> + WHEN 1 THEN 'scanning table'::text
> + WHEN 2 THEN 'computing stats'::text
> + WHEN 3 THEN 'computing extended stats'::text
> + WHEN 4 THEN 'finalizing analyze'::text
>
> Not terrible, but it could be closer.

Agreed.
How about these?:

#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS 1 <- fixed
#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS 2 <- fixed
#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS 3 <- fixed
#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE 4 <- fixed

vs.

WHEN 1 THEN 'acquiring sample rows'::text
WHEN 2 THEN 'computing stats'::text
WHEN 3 THEN 'computing extended stats'::text
WHEN 4 THEN 'finalizing analyze'::text

I revised the name of the constants, so the constants and the phase
names are closer than before. Also, I used Verb instead Gerund
because other phase names used Verb such as VACUUM. :)

> Similarly with the column names (include_children vs. INH).

Fixed.
I selected "include_children" as the column name because it's
easy to understand than "INH".

s/PROGRESS_ANALYZE_INH/
PROGRESS_ANALYZE_INCLUDE_CHILDREN/

Please find attached file. :)

Thanks,
Tatsuro Yamada

Attachment Content-Type Size
v5-Report-progress-for-ANALYZE.patch text/plain 15.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2019-08-13 05:34:37 Re: Global temporary tables
Previous Message Dilip Kumar 2019-08-13 05:22:05 Re: POC: Cleaning up orphaned files using undo logs