Re: pg_ctl/pg_rewind tests vs. slow AIX buildfarm members

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_ctl/pg_rewind tests vs. slow AIX buildfarm members
Date: 2015-10-09 04:46:40
Message-ID: CAB7nPqQW3EYU7j=xBuX2ohj7+gcysduCyjTcAxvcrQ_CSJ8vaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 8, 2015 at 3:09 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Oct 7, 2015 at 11:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>>> On Sat, Sep 26, 2015 at 9:12 AM, Tom Lane wrote:
>>>> So the attached modified patch adjusts the PID-match logic and some
>>>> comments, but is otherwise what I posted before. I believe that this
>>>> might actually work on Windows, but I have no way to test it. Someone
>>>> please try that? (Don't forget to test the service-start path, too.)
>>
>>> If "pg_ctl start" is invoked directly from a command prompt, it works.
>>> Now, if I run "pg_ctl start" within a script (in an msysgit
>>> environment for example), pg_ctl fails, complaining that
>>> postmaster.pid already exists. The TAP tests get broken as well,
>>
>> Reading that again, I think you mean that "pg_ctl start" works but you
>> didn't try "pg_ctl start -w" manually? Because it's "-w" that's at
>> issue here, and the failing test cases are using that.
>
> Yes, sorry. I fat-fingered the command from the prompt and forgot the
> -w switch. test_postmaster_connection just behaves incorrectly, and
> exists immediately with PQPING_NO_RESPONSE, aka "Stopped waiting,
> could not start server, blah".
>
>> I think there is still room to salvage something without fully rewriting
>> the postmaster invocation logic to avoid using CMD, because it's still
>> true that checking whether the CMD process is still there should be as
>> good as checking the postmaster proper. We just can't use kill() for it.
>> I'd be inclined to get rid of the use of kill() on Unix as well; as Noah
>> mentioned upthread, if we're using fork/exec we might as well pay
>> attention to waitpid's results instead. On Windows, I imagine that the
>> thing to do is have start_postmaster() save aside the handle for the CMD
>> process, and then in test_postmaster_connection(), check the handle state
>> to see if the process is still running (I assume there's a way to do
>> that).
>
> That would be WaitForSingleObject(handle, ms_timeout) ==
> WAIT_OBJECT_0, no? The handle should be picked up from
> CreateRestrictedProcess, and I think that CloseHandle should not be
> called on pi.hProcess if we are going to wait for it afterwards.
> Reference:
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms687032%28v=vs.85%29.aspx

I had a look at that, and attached is an updated patch showing the
concept of using an HANDLE that pg_ctl waits for. I agree that saving
an HANDLE the way this patch does using a static variable is a bit
ugly especially knowing that service registration uses
test_postmaster_connection as well with directly a postmaster. The
good thing is that tapcheck runs smoothly, with one single failure
though: the second call to pg_ctl start -w may succeed instead of
failing if kicked within an interval of 3 seconds after the first one,
based on the tests on my Windows VM. My guess is that this is caused
by the fact that we monitor the shell process and not the postmaster
itself. Thoughts?
Regards,
--
Michael

Attachment Content-Type Size
20151009_pgctl_start.patch text/x-diff 11.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-10-09 06:04:31 Re: Foreign join pushdown vs EvalPlanQual
Previous Message Haribabu Kommi 2015-10-09 04:00:30 Re: Multi-tenancy with RLS