Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: robertmhaas(at)gmail(dot)com, kuroda(dot)hayato(at)fujitsu(dot)com, shlok(dot)kyal(dot)oss(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
Date: 2024-01-11 08:33:22
Message-ID: 20240111.173322.1809044112677090191.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for restarting this thread.

At Tue, 9 Jan 2024 09:40:23 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote:
> > I'm not a Windows expert, but my guess is that 0001 is a very good
> > idea. I hope someone who is a Windows expert will comment on that.
>
> I am +1 on 0001. It is just something we've never anticipated when
> these wrappers around cmd in pg_ctl were written.

Thanks for committing it.

> > 0002 seems problematic to me. One potential issue is that it would
> > break if someone renamed postgres.exe to something else -- although
> > that's probably not really a serious problem.
>
> We do a find_other_exec_or_die() on "postgres" with what could be a
> custom execution path. So we're actually sure that the binary will be
> there in the start path, no? I don't like much the hardcoded
> dependency to .exe here.

The patch doesn't care of the path for postgres.exe. If you are referring to the code you cited below, it's for another reason. I'll describe that there.

> > A bigger issue is that
> > it seems like it would break if someone used pg_ctl to start several
> > instances in different data directories on the same machine. If I'm
> > understanding correctly, that currently mostly works, and this would
> > break it.
>
> Not having the guarantee that a single shell_pid is associated to a
> single postgres.exe would be a problem. Now the patch includes this
> code:
> + if (ppe.th32ParentProcessID == shell_pid &&
> + strcmp("postgres.exe", ppe.szExeFile) == 0)
> + {
> + if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
> + multiple_children = true;
> + pm_pid = ppe.th32ProcessID;
> + }
>
> Which is basically giving this guarantee? multiple_children should
> never happen once the autorun part is removed. Is that right?

The patch indeed ensures the relationship between the parent
pg_ctl.exe and postgres.exe. However, for some reason, in my Windows
11 environment with the /D option specified, I observed that another
cmd.exe is spawned as the second child process of the parent
cmd.exe. This is why there is a need to verify the executable file
name. I have no idea how the second cmd.exe is being spawned.

> + * The launcher shell might start other cmd.exe instances or programs
> + * besides postgres.exe. Veryfying the program file name is essential.
>
> With the autorun part of cmd.exe removed, what's still relevant here?

Yes, if the strcmp() is commented out, multiple_children sometimes
becomes true..

> s/Veryfying/Verifying/.

Oops!

> Perhaps 0002 should make more efforts in documenting things like
> th32ProcessID and th32ParentProcessID.

Is it correct to understand that you are requesting changes as follows?

--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1995,11 +1995,14 @@ pgwin32_find_postmaster_pid(pid_t shell_pid)
*
* Check for duplicate processes to ensure reliability.
*
- * The launcher shell might start other cmd.exe instances or programs
- * besides postgres.exe. Verifying the program file name is essential.
- *
- * The launcher shell process isn't checked in this function. It will be
- * checked by the caller.
+ * The ppe entry to be examined is identified by th32ParentProcessID, which
+ * should correspond to the cmd.exe process that executes the postgres.exe
+ * binary. Additionally, th32ProcessID in the same entry should be the PID
+ * of the launched postgres.exe. However, even though we have launched the
+ * parent cmd.exe with the /D option specified, it is sometimes observed
+ * that another cmd.exe is launched for unknown reasons. Therefore, it is
+ * crucial to verify the program file name to avoid returning the wrong
+ * PID.
*/

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Attachment Content-Type Size
v6-0001-Improve-pg_ctl-postmaster-process-check-on-Window.patch text/x-patch 5.3 KB
v6-0002-Remove-short-sleep-from-001_start_stop.pl.patch text/x-patch 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2024-01-11 08:54:20 Improve the connection failure error messages
Previous Message Bertrand Drouvot 2024-01-11 07:49:47 Re: Synchronizing slots from primary to standby