Re: ANALYZE command progress checker

From: vinayak <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE command progress checker
Date: 2017-03-29 08:38:14
Message-ID: f4d42459-fb30-cfe6-378f-83b9df37fbed@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2017/03/25 4:30, Robert Haas wrote:
> On Fri, Mar 24, 2017 at 3:41 AM, vinayak
> <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> I have updated the patch.
> You can't change the definition of AcquireSampleRowsFunc without
> updating the documentation in fdwhandler.sgml, but I think I don't
> immediately understand why that's a thing we want to do at all if
> neither file_fdw nor postgres_fdw are going to use the additional
> argument. It seems to be entirely pointless because the new value
> being passed down is always zero and even if the callee were to update
> it, do_analyze_rel() would just ignore the change on return. Am I
> missing something here? Also, the whitespace-only changes to fdwapi.h
> related to AcquireSampleRowsFunc going to get undone by pgindent, so
> even if there's some reason to change that you should leave the lines
> that don't need changing untouched.
I Understood that we could not change the definition of
AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml.
In the last patch I have modified the definition of
AcquireSampleRowsFunc to handle the inheritance case.
If the table being analyzed has one or more children, ANALYZE will
gather statistics twice:
once on the rows of parent table only and second on the rows of the
parent table with all of its children.
So while reporting the progress the value of number of target sample
rows and number of rows sampled is mismatched.
For single relation it works fine.

In the attached patch I have not change the definition of
AcquireSampleRowsFunc.
I have updated inheritance case in the the documentation.

>
> + /* Reset rows processed to zero for the next column */
> + pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_PROCESSED,
> 0);
>
> This seems like a bad design. If you see a value other than zero in
> this field, you still won't know how much progress analyze has made,
> because you don't know which column you're processing. I also feel
> like you're not paying enough attention to a key point here that I
> made in my last review, which is that you need to look at where
> ANALYZE actually spends most of the time. If the process of computing
> the per-row statistics consumes significant CPU time, then maybe
> there's some point in trying to instrument it, but does it? Unless
> you know the answer to that question, you don't know whether there's
> even a point to trying to instrument this.
>
Understood. The computing statistics phase takes long time so I am
looking at the code.

Regards,
Vinayak Pokale
NTT Open Source Software Center

Attachment Content-Type Size
pg_stat_progress_analyze_v7.patch binary/octet-stream 13.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-03-29 08:38:41 Re: pg_dump emits ALTER TABLE ONLY partitioned_table
Previous Message Kyotaro HORIGUCHI 2017-03-29 08:36:30 Re: [patch] reorder tablespaces in basebackup tar stream for backup_label