Re: pg_ctl/pg_rewind tests vs. slow AIX buildfarm members

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_ctl/pg_rewind tests vs. slow AIX buildfarm members
Date: 2015-09-26 00:12:45
Message-ID: 5525.1443226365@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Attached is a draft patch for this. I think it's fine for Unix (unless
> someone wants to object to relying on "/bin/sh -c"), but I have no idea
> whether it works for Windows. The main risk is that if CMD.EXE runs
> the postmaster as a subprocess rather than overlaying itself a la shell
> "exec", the PID we'd get back would apply only to CMD.EXE not to the
> eventual postmaster. If so, I do not know how to fix that, or whether
> it's fixable at all.

> Note that this makes the test case in question fail reliably, which is
> reasonable behavior IMO so I just changed the test.

> If this does (or can be made to) work on Windows, I'd propose applying
> it back to 9.2, where the current coding came in.

Nobody seems to have stepped up to fix the Windows side of this, but
after some more thought it occurred to me that maybe it doesn't need
(much) fixing. There are two things that pg_ctl wants the PID for:

1. To check if the postmaster.pid file contains the expected child process
PID. Well, that's a nice-to-have, but we never had it before and got
along well enough, because the start-timestamp comparison check covers
most real-world cases. We can omit the PID-match check on Windows.

2. To see whether the child postmaster process is still running, via a
"kill(pid, 0)" test. But if we've launched the postmaster through a
shell, and not used "&" semantics, presumably that shell will wait for its
child process to exit and then exit itself. So *testing whether the shell
is still there is just about as good as testing the real postmaster PID*.

Therefore, we don't really need to worry about rewriting the Windows
subprocess-launch logic. If someone would like to do it, that would
be nice, but we don't have to wait for that to happen before we can
get rid of the 5-second-timeout issues.

So the attached modified patch adjusts the PID-match logic and some
comments, but is otherwise what I posted before. I believe that this
might actually work on Windows, but I have no way to test it. Someone
please try that? (Don't forget to test the service-start path, too.)

regards, tom lane

Attachment Content-Type Size
fork-exec-in-pg_ctl-2.patch text/x-diff 13.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-09-26 00:22:35 Re: Parallel Seq Scan
Previous Message Jeff Janes 2015-09-25 22:18:15 Tab completion for ALTER COLUMN SET STATISTICS