Re: Failed test 'psql query died successfully after SIGQUIT'

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failed test 'psql query died successfully after SIGQUIT'
Date: 2018-01-29 04:48:26
Message-ID: CAEepm=1NYXjGrp_bO33YTpMMpG5BXg6j+STTRfp9g2_0sgHPAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 29, 2018 at 5:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> Does anyone have any thoughts on why Travis CI might be failing
>> occasionally as shown below, but the build farm apparently doesn't?
>> It's not alien technology, it's just Ubuntu Linux running on a virtual
>> machine. One thing that it does slightly differently is "make
>> -Otarget -j3 check-world" but I don't see why -j3 would break
>> something like that specifically, except perhaps by overloading the
>> system in a way that affects timing.
>
> Quite a few of us use -jsomething with make check-world, so I doubt
> that that's an issue per se. I have no idea what -Otarget means in
> this context though, and neither does "info make".

-Otarget means that it should buffer the output from all the parallel
jobs and then serialise them into per-target chunks, so that you can
read the output instead of getting intermingled lines from concurrent
stuff. I learned this trick from Andres... I think it landed in GNU
make 4.something, so it's quite new. The latest web documentation
only says that it exists, and doesn't say what it does!

https://www.gnu.org/software/make/manual/make.html

>> # aborting wait: program timed out
>
> Hmm -- looking at src/test/recovery/t/013_crash_restart.pl, that
> message comes out if this timer expires:
>
> # To avoid hanging while expecting some specific input from a psql
> # instance being driven by us, add a timeout high enough that it
> # should never trigger even on very slow machines, unless something
> # is really wrong.
> my $psql_timeout = IPC::Run::timer(60);
>
> so in other words, all you need for this failure is for
> 013_crash_restart.pl to take more than 60 seconds (per-command, I think,
> not total). I find the comment's optimism about that being impossible
> rather naive :-(, even if the test does take only a couple of seconds on
> typical modern hardware. I see in the buildfarm that skink takes >240
> seconds ...

Ahh.

> If you can easily experiment with changing that limit to 120 or 180
> seconds, or can measure how long the test takes when it passes, that'd
> be useful info.

Will look into that. Thanks!

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-01-29 04:59:13 Re: Failed test 'psql query died successfully after SIGQUIT'
Previous Message Craig Ringer 2018-01-29 04:46:54 Linking PostgreSQL as a C++ program