Re: pgbench logging broken by time logic changes

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gregory Smith <gregsmithpgsql(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, david(dot)christensen(at)crunchydata(dot)com
Subject: Re: pgbench logging broken by time logic changes
Date: 2021-06-23 03:47:37
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 22, 2021 at 12:06:45PM +0200, Fabien COELHO wrote:
> Attached an updated v8 patch which adds (reinstate) an improved TAP test
> which would have caught the various regressions on logs.

> Given that such tests were removed once before, I'm unsure whether they will
> be acceptable, despite that their usefulness has been clearly demonstrated.
> At least it is for the record. Sigh:-(


This v8 is an addition of the fix for the epoch with the adjustments
for the aggregate reports in the logs. The maths look rather right
after a read and after some tests.

+# note: this test is time sensitive, and may fail on a very
+# loaded host.
+# note: --progress-timestamp is not tested
+my $delay = pgbench(
+ '-T 2 -P 1 -l --aggregate-interval=1 -S -b se(at)2'
+ . ' --rate=20 --latency-limit=1000 -j ' . $nthreads
+ . ' -c 3 -r',
+ 0,
+ [ qr{type: multiple},
+ qr{clients: 3},
+ qr{threads: $nthreads},
+ qr{duration: 2 s},
+ qr{script 1: .* select only},
+ qr{script 2: .* select only},
+ qr{statement latencies in milliseconds},
+ qr{FROM pgbench_accounts} ],
+ [ qr{vacuum}, qr{progress: 1\b} ],
+ 'pgbench progress', undef,
+ "--log-prefix=$bdir/001_pgbench_log_1");
Could we make that shorter at 1s? That will shorten the duration of
the test run. It is easy to miss that this test is for
--aggregate-interval (aka the logAgg() path) without a comment.

+# cool check that we are around 2 seconds
+# The rate may results in an unlucky schedule which triggers
+# an early exit, hence the loose bound.
+ok(0.0 < $delay && $delay < 4.0, "-T 2 run around 2 seconds");

The command itself would not fail, but we would just fail on a machine
where the difference in start/stop time is higher than 4 seconds,
right? On RPI-level machines, this could fail reliably. I am not
completely sure what's the additional value we can get from that extra
test, to be honest.

+# $nthreads threads, 2 seconds, but due to timing imprecision we might get
+# only 1 or as many as 3 progress reports per thread.
+check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
+ qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
Now this one is good and actually stable thanks to the fact that we'd
get at least the final logs, and the main complain we got to discuss
about on this thread was the format of the logs. I would say to give
up on the first test, and keep the second. But, is this regex
correct? Shouldn't we check for six integer fields only with the
first one having a minimal number of digits for the epoch?

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-06-23 03:59:16 Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc
Previous Message Ajin Cherian 2021-06-23 03:39:58 Re: [HACKERS] logical decoding of two-phase transactions