Re: src/test/subscription/t/002_types.pl hanging on particular environment

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: src/test/subscription/t/002_types.pl hanging on particular environment
Date: 2017-09-19 21:20:49
Message-ID: 2769.1505856049@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> This type of violent shutdown seems to be associated with occasional
> corruption of .gcda files (the files output by GCC coverage builds).
> The symptoms are that if you use --enable-coverage and make
> check-world you'll very occasionally get a spurious TAP test failure
> like this:

> # Failed test 'pg_ctl start: no stderr'
> # at /home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_ctl/../../../src/test/perl/TestLib.pm
> line 301.
> # got:
> 'profiling:/home/travis/build/postgresql-cfbot/postgresql/src/backend/nodes/copyfuncs.gcda:Merge
> mismatch for function 94
> # '
> # expected: ''

> I'm not sure of the exact mechanism though. GCC supplies a function
> __gcov_flush() that normally runs at exit or execve, so if you're
> killed without reaching those you don't get any .gcda data. Perhaps
> we are in exit (or fork/exec) partway through writing out coverage
> data in __gcov_flush(), and at that moment we are killed. Then a
> subsequent run of instrumented code will find the half-written file
> and print the "Merge mismatch" message.

On a slow/loaded machine, perhaps it could be that the postmaster loses
patience and SIGKILLs a backend that's still writing its .gcda data?
If so, maybe we could make SIGKILL_CHILDREN_AFTER_SECS longer in
coverage builds? Or bite the bullet and make it configurable ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-09-19 21:21:13 Re: Show backtrace when tap tests fail
Previous Message Andreas Karlsson 2017-09-19 21:19:54 Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?