Re: ANALYZE command progress checker

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: vinayak <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE command progress checker
Date: 2017-03-19 08:56:07
Message-ID: CAE9k0P=k7vBz1ywYFUgPDsGJxpfx_dPcFEoWvinefOgVhV9DfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi,

I didn't find any major issues with the patch. It works as expected.
However, I have few minor comments that I would like to share,

+ <entry>
+ Total number of sample rows. The sample it reads is taken randomly.
+ Its size depends on the <varname>default_statistics_target</>
parameter value.
+ </entry>

1) I think it would be better if you could specify reference link to
the guc variable 'default_statistics_target'. Something like this,

<xref linkend='guc-default-statistics-target'>.

If you go through monitoring.sgml, you would find that there is
reference link to all guc variables being used.

2) I feel that the 'computing_statistics' phase could have been
splitted into 'computing_column_stats', 'computing_index_stats'....
Please let me know your thoughts on this.

+ certain commands during command execution. Currently, the command
+ which supports progress reporting is <command>VACUUM</> and
<command>ANALYZE</>. This may be
expanded in the future.

3) I think above needs to be rephrased. Something like...Currently,
the supported progress reporting commands are 'VACUUM' and
'ANALYZE'.

Moreover, I also ran your patch on Windows platform and didn't find
any issues. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Sat, Mar 18, 2017 at 5:30 PM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> Hi Vinayak,
>
> Here are couple of review comments that may need your attention.
>
> 1. Firstly, I am seeing some trailing whitespace errors when trying to
> apply your v3 patch using git apply command.
>
> [ashu(at)localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch
> pg_stat_progress_analyze_v3.patch:155: trailing whitespace.
> CREATE VIEW pg_stat_progress_analyze AS
> pg_stat_progress_analyze_v3.patch:156: trailing whitespace.
> SELECT
> pg_stat_progress_analyze_v3.patch:157: trailing whitespace.
> S.pid AS pid, S.datid AS datid, D.datname AS datname,
> pg_stat_progress_analyze_v3.patch:158: trailing whitespace.
> S.relid AS relid,
> pg_stat_progress_analyze_v3.patch:159: trailing whitespace.
> CASE S.param1 WHEN 0 THEN 'initializing'
> error: patch failed: doc/src/sgml/monitoring.sgml:521
>
> 2) The test-case 'rules' is failing. I think it's failing because in
> rules.sql 'ORDERBY viewname' is used which will put
> 'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the
> list of catalog views. You may need to correct rules.out file
> accordingly. This is what i could see in rules.sql,
>
> SELECT viewname, definition FROM pg_views WHERE schemaname <>
> 'information_schema' ORDER BY viewname;
>
> I am still reviewing your patch and doing some testing. Will update if
> i find any issues. Thanks.
>
> --
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com
>
> On Fri, Mar 17, 2017 at 3:16 PM, vinayak
> <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>
>> On 2017/03/17 10:38, Robert Haas wrote:
>>>
>>> On Fri, Mar 10, 2017 at 2:46 AM, vinayak
>>> <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>
>>>> Thank you for reviewing the patch.
>>>>
>>>> The attached patch incorporated Michael and Amit comments also.
>>>
>>> I reviewed this tonight.
>>
>> Thank you for reviewing the patch.
>>>
>>> + /* Report compute index stats phase */
>>> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>>> +
>>> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);
>>>
>>> Hmm, is there really any point to this? And is it even accurate? It
>>> doesn't look to me like we are computing any index statistics here;
>>> we're only allocating a few in-memory data structures here, I think,
>>> which is not a statistics computation and probably doesn't take any
>>> significant time.
>>>
>>> + /* Report compute heap stats phase */
>>> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>>> +
>>> PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);
>>>
>>> The phase you label as computing heap statistics also includes the
>>> computation of index statistics. I wouldn't bother separating the
>>> computation of heap and index statistics; I'd just have a "compute
>>> statistics" phase that begins right after the comment that starts
>>> "Compute the statistics."
>>
>> Understood. Fixed in the attached patch.
>>>
>>>
>>> + /* Report that we are now doing index cleanup */
>>> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>>> + PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);
>>>
>>> OK, this doesn't make any sense either. We are not cleaning up the
>>> indexes here. We are just closing them and releasing resources. I
>>> don't see why you need this phase at all; it can't last more than some
>>> tiny fraction of a second. It seems like you're trying to copy the
>>> exact same phases that exist for vacuum instead of thinking about what
>>> makes sense for ANALYZE.
>>
>> Understood. I have removed this phase.
>>>
>>>
>>> + /* Report number of heap blocks scaned so far*/
>>> + pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
>>> targblock);
>>>
>>> While I don't think this is necessarily a bad thing to report, I find
>>> it very surprising that you're not reporting what seems to me like the
>>> most useful thing here - namely, the number of blocks that will be
>>> sampled in total and the number of those that you've selected.
>>> Instead, you're just reporting the length of the relation and the
>>> last-sampled block, which is a less-accurate guide to total progress.
>>> There are enough counters to report both things, so maybe we should.
>>>
>>> + /* Report total number of sample rows*/
>>> + pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS,
>>> numrows);
>>>
>>> As an alternative to the previous paragraph, yet another thing you
>>> could report is number of rows sampled out of total rows to be
>>> sampled. But this isn't the way to do it: you are reporting the
>>> number of rows you sampled only once you've finished sampling. The
>>> point of progress reporting is to report progress -- that is, to
>>> report this value as it's updated, not just to report it when ANALYZE
>>> is practically done and the value has reached its maximum.
>>
>> Understood.
>> I have reported number of rows sampled out of total rows to be sampled.
>> I have not reported the number of blocks that will be sampled in total.
>> So currect pg_stat_progress_analyze view looks like:
>>
>> =# \d+ pg_stat_progress_analyze
>> View "pg_catalog.pg_stat_progress_analyze"
>> Column | Type | Collation | Nullable | Default | Storage
>> | Descripti
>> on
>> ------------------------+---------+-----------+----------+---------+----------+----------
>> ---
>> pid | integer | | | | plain
>> |
>> datid | oid | | | | plain
>> |
>> datname | name | | | | plain
>> |
>> relid | oid | | | | plain
>> |
>> phase | text | | | |
>> extended |
>> num_target_sample_rows | bigint | | | | plain
>> |
>> num_rows_sampled | bigint | | | | plain
>> |
>> View definition:
>> SELECT s.pid,
>> s.datid,
>> d.datname,
>> s.relid,
>> CASE s.param1
>> WHEN 0 THEN 'initializing'::text
>> WHEN 1 THEN 'collecting sample rows'::text
>> WHEN 2 THEN 'computing statistics'::text
>> ELSE NULL::text
>> END AS phase,
>> s.param2 AS num_target_sample_rows,
>> s.param3 AS num_rows_sampled
>> FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid,
>> param1, param2, p
>> aram3, param4, param5, param6, param7, param8, param9, param10)
>> LEFT JOIN pg_database d ON s.datid = d.oid;
>>
>> Comment?
>>>
>>> The documentation for this patch isn't very good, either. You forgot
>>> to update the part of the documentation that says that VACUUM is the
>>> only command that currently supports progress reporting, and the
>>> descriptions of the phases and progress counters are brief and not
>>> entirely accurate - e.g. heap_blks_scanned is not actually the number
>>> of heap blocks scanned, because we are sampling, not reading all the
>>> blocks.
>>
>> Understood. I have updated the documentation.
>> I will also try to improve documentation.
>>>
>>> When adding calls to the progress reporting functions, please consider
>>> whether a blank line should be added before or after the new code, or
>>> both, or neither. I'd say some blank lines are needed in a few places
>>> where you didn't add them.
>>
>> +1. I have added blank lines in a few places.
>>
>> Regards,
>> Vinayak Pokale
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-03-19 09:02:31 Re: PinBuffer() no longer makes use of strategy
Previous Message Pavan Deolasee 2017-03-19 07:15:50 Re: Patch: Write Amplification Reduction Method (WARM)