Re: ANALYZE command progress checker

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: vinayak <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE command progress checker
Date: 2017-03-17 01:38:40
Message-ID: CA+TgmoaGCwYkk2xiPKnYNqnNQEDuM9L6KYWU8vTb-tVb6BZhaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 10, 2017 at 2:46 AM, vinayak
<Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Thank you for reviewing the patch.
>
> The attached patch incorporated Michael and Amit comments also.

I reviewed this tonight.

+ /* Report compute index stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+
PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

Hmm, is there really any point to this? And is it even accurate? It
doesn't look to me like we are computing any index statistics here;
we're only allocating a few in-memory data structures here, I think,
which is not a statistics computation and probably doesn't take any
significant time.

+ /* Report compute heap stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);

The phase you label as computing heap statistics also includes the
computation of index statistics. I wouldn't bother separating the
computation of heap and index statistics; I'd just have a "compute
statistics" phase that begins right after the comment that starts
"Compute the statistics."

+ /* Report that we are now doing index cleanup */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);

OK, this doesn't make any sense either. We are not cleaning up the
indexes here. We are just closing them and releasing resources. I
don't see why you need this phase at all; it can't last more than some
tiny fraction of a second. It seems like you're trying to copy the
exact same phases that exist for vacuum instead of thinking about what
makes sense for ANALYZE.

+ /* Report number of heap blocks scaned so far*/
+ pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
targblock);

While I don't think this is necessarily a bad thing to report, I find
it very surprising that you're not reporting what seems to me like the
most useful thing here - namely, the number of blocks that will be
sampled in total and the number of those that you've selected.
Instead, you're just reporting the length of the relation and the
last-sampled block, which is a less-accurate guide to total progress.
There are enough counters to report both things, so maybe we should.

+ /* Report total number of sample rows*/
+ pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows);

As an alternative to the previous paragraph, yet another thing you
could report is number of rows sampled out of total rows to be
sampled. But this isn't the way to do it: you are reporting the
number of rows you sampled only once you've finished sampling. The
point of progress reporting is to report progress -- that is, to
report this value as it's updated, not just to report it when ANALYZE
is practically done and the value has reached its maximum.

The documentation for this patch isn't very good, either. You forgot
to update the part of the documentation that says that VACUUM is the
only command that currently supports progress reporting, and the
descriptions of the phases and progress counters are brief and not
entirely accurate - e.g. heap_blks_scanned is not actually the number
of heap blocks scanned, because we are sampling, not reading all the
blocks.

When adding calls to the progress reporting functions, please consider
whether a blank line should be added before or after the new code, or
both, or neither. I'd say some blank lines are needed in a few places
where you didn't add them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-03-17 01:50:32 Re: [patch] reorder tablespaces in basebackup tar stream for backup_label
Previous Message Michael Paquier 2017-03-17 01:31:26 Re: Renaming of pg_xlog and pg_clog