Re: Testlib.pm vs msys

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Testlib.pm vs msys
Date: 2017-07-23 16:43:25
Message-ID: 11524.1500828205@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> It turns out I was wrong about the problem jacana has been having with
> the pg_ctl tests hanging. The problem was not the use of select as a
> timeout mechanism, although I think the change to using
> Time::Hires::usleep() is correct and shouldn't be reverted.

> The problem is command_like's use of redirection to strings. Why this
> should be a problem for this particular use is a matter of speculation.
> I suspect it's to do with the fact that in this instance pg_ctl is
> leaving behind some child processes (i.e. postmaster and children) after
> it exits, and so on this particular path IPC::Run isn't detecting the
> exit properly. The workaround I have found to work is to redirect
> command_like's output instead to a couple of files and then slurp in
> those files and delete them. A bit hacky, I know, so I'm open to other
> suggestions.

Yeah, I'd been eyeing that behavior of IPC::Run a month or so back,
though from the opposite direction. If you are reading either stdout
or stderr of the executed command into Perl, then it detects command
completion by waiting till it gets EOF on those stream(s). If you
are reading neither, then it goes into this wonky backoff behavior
where it sleeps a bit and then checks waitpid(WNOHANG), with the
value of "a bit" continually increasing until it reaches a fairly
large value, half a second or a second (I forget). So you have
potentially some sizable fraction of a second that's just wasted after
command termination. I'd been able to make a small but noticeable
improvement in the runtime of some of our TAP test suites by forcing
the first behavior, ie reading stdout even if we were going to throw
it away.

So I'm not really that excited about going in the other direction ;-).
It shouldn't matter much time-wise for short-lived commands, but it's
disturbing if the EOF technique fails entirely for some cases.

I looked at jacana's two recent pg_ctlCheck failures, and they both
seem to have failed on this:

command_like([ 'pg_ctl', 'start', '-D', "$tempdir/data",
'-l', "$TestLib::log_path/001_start_stop_server.log" ],
qr/done.*server started/s, 'pg_ctl start');

That is redirecting the postmaster's stdout/stderr into a file,
for sure, so the child processes shouldn't impact EOF detection AFAICS.
It's also hard to explain this way why it only fails some of the time.

I think we need to look at what the recent changes were in this area
and try to form a better theory of why it's started to fail here.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-07-23 16:55:47 Re: Improve perfomance for index search ANY(ARRAY[]) condition with single item
Previous Message Dima Pavlov 2017-07-23 15:22:11 Improve perfomance for index search ANY(ARRAY[]) condition with single item