|From:||Michael Paquier <michael(at)paquier(dot)xyz>|
|To:||Michael Banck <michael(dot)banck(at)credativ(dot)de>|
|Cc:||Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, alvherre(at)2ndquadrant(dot)com, mailings(at)oopsware(dot)de, thomas(dot)munro(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org|
|Subject:||Re: Progress reporting for pg_verify_checksums|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Sat, Mar 30, 2019 at 09:07:41PM +0100, Michael Banck wrote:
> The way you've now changed this is that there's a function call into
> progress_report() for every block that's being read, even if there is no
> progress reporting requested. That looks like a pretty severe
> performance problem so I suggest to at least stick with checking
> showprogress before calling progress_report() and not the other way
> So my vote is in favour of only calling progress_report() every once in
> a while - I saw quite a speedup (or removal of slowdown) due to this in
> my tests, this was not just some unwarranted microoptimization.
Do you have some runtime numbers? If all the pages are in the OS
cache you should be able to see more impact. I have been testing with
a pgbench database at scale 300 and --check, and perf is showing me
that progress_report counts for 0.10% with or without the option of
the profile while pg_checksum_page() counts for 36%. Switching the
switch in or out of it makes the performance change lost in the noise.
I'd rather keep the check for showprogress within progress_report() so
as extra callers don't miss bypassing the report if --progress is not
enabled, still we are talking about only one caller now so the
attached does it the other way around with an assertion.
>> + fprintf(stderr, "\r");
> I think the isatty() check that was in our versions of the patch is
> useful here, did you check how this looks when piping the output to a
Oops, fixed. The output was strange.
> This hunk is from the performance optimization of calling
> progress_report only every 1024 blocks and could be removed if we stick
> with calling it for every block anyway (but see above).
Indeed, removed this one.
>> -static void
>> -scan_directory(const char *basedir, const char *subdir)
>> + * Scan the given directory for items which can be checksummed and
>> + * operate on each one of them. If "sizeonly" is true, the size of
>> + * all the items which have checksums is computed and returned back
> "computed" is maybe a bit strong a word here, how about "added up"?
"computed" sounds fine to me.
|Next Message||Michael Paquier||2019-04-01 06:39:45||Re: REINDEX CONCURRENTLY 2.0|
|Previous Message||Masahiko Sawada||2019-04-01 05:26:15||Re: New vacuum option to do only freezing|