Re: tap tests on older branches fail if concurrency is used

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tap tests on older branches fail if concurrency is used
Date: 2017-06-07 05:37:19
Message-ID: CAB7nPqQTGveO3_zvnXQAs32cMosX8c76_ewe4L1cNL=1xZmt+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 31, 2017 at 8:45 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 1 June 2017 at 08:15, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Hi,
>>
>> when using
>> $ cat ~/.proverc
>> -j9
>>
>> some tests fail for me in 9.4 and 9.5. E.g. src/bin/script's tests
>> yields a lot of fun like:
>> $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check)
>> ...
>> # LOG: received immediate shutdown request
>> # WARNING: terminating connection because of crash of another server process
>> # DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>> # HINT: In a moment you should be able to reconnect to the database and repeat your command.
>> ...
>>
>> it appears as if various tests are trampling over each other.

They are. The problem can be easily reproduced on my side with that:
PROVE_FLAGS="-j 9" make check
It would be nice to get a minimum of stability for those tests in
back-branches even if PostgresNode.pm is not back-patched.

> The immediate problem appears to be that they all use
> tmp_check/postmaster.log . So anything that examines the logs gets
> confused by seeing some other postgres instance's logs, or a mixture,
> trampling everywhere.

Amen.

> I'll be surprised if there aren't other problems though. Rather than
> trying to fix it all up, this seems like a good argument for
> backporting the updated suite from 9.6 or pg10, with PostgresNode etc.
> I already have a working tree with that done to use src/test/recovery
> in 9.5, but haven't updated src/bin/scripts etc yet.

Yup. Even if PostgresNode.pm is not back-patched, a small trick is to
append the PID of the process running the TAP test to the log file
name as in the patch attached. This gives enough uniqueness for the
tests to pass with a high parallel degree.

A second error that I have spotted is in the tests of pg_rewind, which
would fail in parallel as the same data folders are used for each
test. Using the same trick with $$ makes the tests more stable.

A third error is a failure in contrib/test_decoding, and this has been
addressed by Andres in 60f826c.

Attached is a patch for the first two ones, which makes the tests more
robust. I am myself annoyed by parallel tests failing when working on
patches for back-branches, so having at least a minimal fix would be
nice.
--
Michael

Attachment Content-Type Size
tap-stability-95.patch application/octet-stream 1.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-06-07 05:39:26 Re: tap tests on older branches fail if concurrency is used
Previous Message Thomas Munro 2017-06-07 05:36:26 Re: PG10 transition tables, wCTEs and multiple operations on the same table