Re: Progress reporting for pg_verify_checksums

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, alvherre(at)2ndquadrant(dot)com, mailings(at)oopsware(dot)de, thomas(dot)munro(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress reporting for pg_verify_checksums
Date: 2019-03-27 20:31:30
Message-ID: alpine.DEB.2.21.1903272111370.7287@lancre
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

>> I still think that the speed should compute a double to avoid integer
>> rounding errors within the computation. ISTM that rounding should only be
>> done for display in the end.
>
> Ok, also done this way. I decided to print only full MB and not any
> further digits as those don't seem very relevant.

For the computation to be in double, you must convert to double somewhere
in the formula, otherwise it is computed as ints and only cast as a double
afterwards. Maybe:

current_speed = (1000.0 / MEGABYTES) * current_size / elapsed;

Or anything which converts to double early.

Instead of casting percent to int, maybe use "%.0f" as well, just like
current_speed?

+ if ((blockno % 100 == 0) && show_progress)

I'd invert the condition to avoid a modulo when not needed.

I'm not very found of global variables, but it does not matter here and
doing it differently would involve more code. It would matter though if
the progress report would be shared between commands.

>> I'm okay with calling the report on each file even if this means every few
>> GB...
>
> For my I/O throttling branch I've backed off to only call the report
> every 100 blocks (and at the end of the file). I've added that logic to
> this patch as well.

The 100 blocks = 800 KiB looks arbitrary. What is the rational? ISTM that
postgres will tend to store a power of two number of pages on full
segments, so maybe there could be a rational for 1024. Not sure.

>> Someone suggested ETA, and it seems rather simple to do. What about
>> adding it?
>
> I don't know, just computing the ETA is easy of course. But you'd have
> to fiddle with seconds/minutes/hours conversions cause the runtime could
> be hours. Then, you don't want to have it bumping around weirdly when
> your I/O rate changes, cf. https://xkcd.com/612/ That doesn't sounds
> simpler than the signal trigger we might get rid of.

Ok, it is more complicated that it looks for large sizes if second is not
the right display unit.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message legrand legrand 2019-03-27 20:36:08 Re: Planning counters in pg_stat_statements (using pgss_store)
Previous Message legrand legrand 2019-03-27 20:17:02 RE: minimizing pg_stat_statements performance overhead