Re: What is the best and easiest implementation to reliably wait for the completion of startup?

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: What is the best and easiest implementation to reliably wait for the completion of startup?
Date: 2011-05-28 03:42:39
Message-ID: 45EA0477951744379296D472397BCDD4@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> "MauMau" <maumau307(at)gmail(dot)com> writes:
>> The bad thing is that pg_ctl continues to wait until the specified
>> duration
>> passes, even if postgres fails to start. For example, it is naturally
>> desirable for pg_ctl to terminate when postgresql.conf contains a syntax
>> error.
>
> Hmm, I thought we'd fixed this in the last go-round of pg_ctl wait
> revisions, but testing proves it does not work desirably in HEAD:
> not only does pg_ctl wait till its timeout elapses, but it then reports
> "server started" even though the server didn't start. That's clearly a
> bug :-(
>
> I think your proposal of a pipe-based solution might be overkill though.
> Seems like it would be sufficient for pg_ctl to give up if it doesn't
> see the postmaster.pid file present within a couple of seconds of
> postmaster startup. I don't really want to add logic to the postmaster
> to have the sort of reporting protocol you propose, because not
> everybody uses pg_ctl to start the postmaster. In any case, we need a
> fix in 9.1 ...

Yes, I was a bit afraid the pipe-based fix might be overkill, too, so I was
wondering if there might be a more easy solution.

"server started"... I missed it. That's certainly a bug, as you say.

I was also considering the postmaster.pid-based solution exactly as you
suggest, but that has a problem -- how many seconds do we assume for "a
couple of seconds"? If the system load is temporarily so high that
postmaster takes many seconds to create postmaster.pid, pg_ctl mistakenly
thinks that postmaster failed to start. I know this is a hypothetical rare
case. I don't like touching the postmaster logic and complicating it, but
logical correctness needs to come first (Japanese users are very severe).

Another problem with postmaster.pid-based solution happens after postmaster
crashes. When postmaster crashes, postmaster.pid is left. If the pid in
postmaster.pid is allocated to some non-postgres process and that process
remains, pg_ctl misjudges that postmaster is starting up, and waits for long
time.

Regards
MauMau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2011-05-28 03:47:30 Re: How can I check the treatment of bug fixes?
Previous Message Alvaro Herrera 2011-05-28 01:53:52 Re: [COMMITTERS] pgsql: Allow ALTER TABLE name {OF type | NOT OF}.