Re: pgbench regression test failure

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench regression test failure
Date: 2017-09-23 17:06:55
Message-ID: 10943.1506186415@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>> [...] After another week of buildfarm runs, we have a few more cases of
>> 3 rows of output, and none of more than 3 or less than 1. So I went
>> ahead and pushed your patch. I'm still suspicious of these results, but
>> we might as well try to make the buildfarm green pending investigation
>> of how this is happening.

> Yep. I keep the issue of pgbench tap test determinism in my todo list,
> among other things.

> I think that it should be at least clearer under which condition (load ?
> luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs
> some thinking.

skink blew up real good just now:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-09-23%2010%3A50%3A01

the critical bit being

# Failed test 'pgbench progress stderr /(?^:progress: 1\b)/'
# at /home/andres/build/buildfarm/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 369.
# 'starting vacuum...end.
# progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped
# progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped
# progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 skipped
# '
# doesn't match '(?^:progress: 1\b)'

# Failed test 'transaction count for 001_pgbench_log_1.15981 (5)'
# at t/001_pgbench_with_server.pl line 438.

# Failed test 'transaction count for 001_pgbench_log_1.15981.1 (4)'
# at t/001_pgbench_with_server.pl line 438.
# Looks like you failed 3 tests of 233.

That's exceeded my patience with this test case, so I've removed it
for the moment. We can put it back as soon as we figure some way
to make it more robust. (BTW, the "-nan" bits suggest an actual
pgbench bug, independently of anything else.)

Possibly you can duplicate skink's issue by running things under
valgrind.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2017-09-23 17:27:08 Re: Rethinking autovacuum.c memory handling
Previous Message Alvaro Hernandez 2017-09-23 17:01:26 Re: Built-in plugin for logical decoding output