Skip site navigation (1) Skip section navigation (2)

Re: Unreliable "pg_ctl -w start" again

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unreliable "pg_ctl -w start" again
Date: 2012-01-28 02:36:18
Message-ID: 4F0633450B7A4741B348040C8831EC90@maumau (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Well, feel free to increase that duration if you want.  The reason it's
> there is to not wait for a long time if the postmaster falls over
> instantly at startup, but in a non-interactive situation you might not
> care.

Yes, just lengthening the wait duration causes unnecessary long wait when we 
run pg_ctl interactively. Therefore, the current wait approach is is not 

>> How about inserting postmaster_is_alive() as below?
> Looks like complete nonsense to me, if the goal is to behave sanely when
> hasn't been created yet.  Where do you think get_pgpid
> gets the PID from?

Yes, I understand that get_pgpid() gets the pid from, which 
may be the pid of the previous postmaster that did not stop cleanly.

I think my simple fix makes sense to solve the problem as follows. Could you 
point out what might not be good?

1.The previous postmaster was terminated abruptly due to OS shutdown, 
machine failure, etc. leaving
2.Run "pg_ctl -w start" to start new postmaster.
3.do_start() of pg_ctl reads the pid of previously running postmaster from Say, let it be pid-1 (old_pid in code) here.

  old_pid = get_pgpid();

4.Anyway, try to start postmaster by calling start_postmaster().
5.If existed at step 3, it means either of:

(a) Previous postmaster did not stop cleanly and left
(b) Another postmaster is already running in the data directory (since 
before running pg_ctl -w start this time.)

But we can't distinguish between them. Then, we read again to 
judge the situation. Let it be pid-2 (pid in code).

 if (old_pid != 0)
  pid = get_pgpid();

6.If pid-1 != pid-2, it means that the situation (a) applies and the newly 
started postmaster overwrote old Then, try to connect to 

If pid-1 == pid-2, it means either of:

(a') Previous postmaster did not stop cleanly and left Newly 
started postmaster will complete startup, but hasn't overwritten yet.
(b) Another postmaster is already running in the data directory (since 
before running pg_ctl -w start this time.)

The current comparison logic cannot distinguish between them. In my problem 
situation, situation a' happened, and pg_ctl mistakenly exited.

  if (pid == old_pid)
   write_stderr(_("%s: could not start server\n"
         "Examine the log output.\n"),

7.To distinguish between a' and b, check if pid-1 is alive. If pid-1 is 
alive, it means situation b. Otherwise, that is situation a'.

  if (pid == old_pid && postmaster_is_alive(old_pid))

However, the pid of newly started postmaster might match the one of old 
postmaster. To deal with that situation, it may be better to check the 
modified timestamp of in addition.

What do you think?

> If we had the postmaster's PID a priori, we could detect postmaster
> death directly instead of having to make assumptions about how long
> is reasonable to wait for the pidfile to appear.  The problem is that
> we don't want to write a complete replacement for the shell's command
> line parser and I/O redirection logic.  It doesn't look like a small
> project.

Yes, I understand this. I don't think we can replace shell's various work.

> (But maybe we could bypass that by doing a fork() and then having
> the child exec() the shell, telling it to exec postmaster in turn?)

Possibly. I hope this works. Then, we can pass unnamed pipe file descriptors 
to postmaster via environment variables from the pg_ctl's forked child.

> And of course Windows as usual makes things twice as hard, since we
> couldn't make such a change unless start_postmaster could return the
> proper PID in that case too.

Well, we can make start_postmaster() return the pid of the newly created 
postmaster. CreateProcess() sets the process handle in the structure passed 
to it. We can pass the process handle to WaitForSingleObject8) to know 
whether postmaster is alive.


In response to


pgsql-hackers by date

Next:From: MauMauDate: 2012-01-28 02:36:41
Subject: Re: Unreliable "pg_ctl -w start" again
Previous:From: Thom BrownDate: 2012-01-28 01:53:24
Subject: Temp file missing during large pgbench data set

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group