Re: ANALYZE command progress checker

From: vinayak <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE command progress checker
Date: 2017-03-17 09:46:23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017/03/17 10:38, Robert Haas wrote:
> On Fri, Mar 10, 2017 at 2:46 AM, vinayak
> <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Thank you for reviewing the patch.
>> The attached patch incorporated Michael and Amit comments also.
> I reviewed this tonight.
Thank you for reviewing the patch.
> + /* Report compute index stats phase */
> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
> +
> Hmm, is there really any point to this? And is it even accurate? It
> doesn't look to me like we are computing any index statistics here;
> we're only allocating a few in-memory data structures here, I think,
> which is not a statistics computation and probably doesn't take any
> significant time.
> + /* Report compute heap stats phase */
> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
> The phase you label as computing heap statistics also includes the
> computation of index statistics. I wouldn't bother separating the
> computation of heap and index statistics; I'd just have a "compute
> statistics" phase that begins right after the comment that starts
> "Compute the statistics."
Understood. Fixed in the attached patch.
> + /* Report that we are now doing index cleanup */
> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
> OK, this doesn't make any sense either. We are not cleaning up the
> indexes here. We are just closing them and releasing resources. I
> don't see why you need this phase at all; it can't last more than some
> tiny fraction of a second. It seems like you're trying to copy the
> exact same phases that exist for vacuum instead of thinking about what
> makes sense for ANALYZE.
Understood. I have removed this phase.
> + /* Report number of heap blocks scaned so far*/
> + pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
> targblock);
> While I don't think this is necessarily a bad thing to report, I find
> it very surprising that you're not reporting what seems to me like the
> most useful thing here - namely, the number of blocks that will be
> sampled in total and the number of those that you've selected.
> Instead, you're just reporting the length of the relation and the
> last-sampled block, which is a less-accurate guide to total progress.
> There are enough counters to report both things, so maybe we should.
> + /* Report total number of sample rows*/
> + pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows);
> As an alternative to the previous paragraph, yet another thing you
> could report is number of rows sampled out of total rows to be
> sampled. But this isn't the way to do it: you are reporting the
> number of rows you sampled only once you've finished sampling. The
> point of progress reporting is to report progress -- that is, to
> report this value as it's updated, not just to report it when ANALYZE
> is practically done and the value has reached its maximum.
I have reported number of rows sampled out of total rows to be sampled.
I have not reported the number of blocks that will be sampled in total.
So currect pg_stat_progress_analyze view looks like:

=# \d+ pg_stat_progress_analyze
View "pg_catalog.pg_stat_progress_analyze"
Column | Type | Collation | Nullable | Default |
Storage | Descripti
pid | integer | | | |
plain |
datid | oid | | | |
plain |
datname | name | | | |
plain |
relid | oid | | | |
plain |
phase | text | | | |
extended |
num_target_sample_rows | bigint | | | |
plain |
num_rows_sampled | bigint | | | |
plain |
View definition:
CASE s.param1
WHEN 0 THEN 'initializing'::text
WHEN 1 THEN 'collecting sample rows'::text
WHEN 2 THEN 'computing statistics'::text
END AS phase,
s.param2 AS num_target_sample_rows,
s.param3 AS num_rows_sampled
FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid,
param1, param2, p
aram3, param4, param5, param6, param7, param8, param9, param10)
LEFT JOIN pg_database d ON s.datid = d.oid;

> The documentation for this patch isn't very good, either. You forgot
> to update the part of the documentation that says that VACUUM is the
> only command that currently supports progress reporting, and the
> descriptions of the phases and progress counters are brief and not
> entirely accurate - e.g. heap_blks_scanned is not actually the number
> of heap blocks scanned, because we are sampling, not reading all the
> blocks.
Understood. I have updated the documentation.
I will also try to improve documentation.
> When adding calls to the progress reporting functions, please consider
> whether a blank line should be added before or after the new code, or
> both, or neither. I'd say some blank lines are needed in a few places
> where you didn't add them.
+1. I have added blank lines in a few places.

Vinayak Pokale

Attachment Content-Type Size
pg_stat_progress_analyze_v3.patch binary/octet-stream 10.8 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2017-03-17 09:55:54 Re: Monitoring roles patch
Previous Message Heikki Linnakangas 2017-03-17 09:40:54 Re: Authentication tests, and plain 'password' authentication with a SCRAM verifier