| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
|---|---|
| To: | pgpool-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Problem with pcp process |
| Date: | 2026-04-26 05:56:19 |
| Message-ID: | 20260426.145619.2101180528950972824.ishii@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgpool-hackers |
> Koshino told me off-list that following script does not work:
> -------------------------------------
> pgpool_setup -n 3 --no-stop
> pg_ctl -D data2 stop
> while true
> do
> psql -p 11000 -c "show pool_nodes" test
> if [ $? = 0 ];then
> break;
> fi
> sleep 1
> done
> psql -p 11000 -c "show pool_nodes" test
> pcp_recovery_node -p 11001 -n 2;pcp_promote_node -p 11001 -n 2 -s -g
> -------------------------------------
>
> pcp_recovery_node reports success but pcp_promote_node just hangs. I
> found pcp worker process loops infinitely around line 584 in
> pool_detach_node (pcp_worker.c):
>
> while (!pcp_worker_wakeup_request)
> {
> struct timeval t = {1, 0};
>
> select(0, NULL, NULL, NULL, &t);
> }
>
> pcp_worker_wakeup_request is a variable supposed to be set to 1 by
> SIGUSR2 signal handler. When pgpool main finishes failover requests
> from pcp, it sends SIGUSR2 to pcp main process, then it forwards to
> pcp worker process, and its signal handler sets the variable to 1. To
> find the process id to forward the signal, pcp main process keeps a
> list of pids of forked children (pcp worker process) in its local
> memory.
>
> Upon failover, pgpool main sends a signal to pcp main process to
> request restarting, and pgpool main restarts. Problem is, when pcp
> main restarts, it forgets the list of pids. As a result, when pgpool
> main sends SIGUSR2 to pcp main, it cannot find the pid to send the
> signal to, which causes the infinite loop in pcp worker process.
>
> To fix the problem, we could delay the restarting of pcp main until it
> delivers the signal. Unfortunately this does not work, since pgpool
> main waits for pcp main process to exit. Thus processing failover does
> not proceed in pgpool main.
>
> So I decided to add a new shared memory area to hold the pcp workers
> pids as an array. Upon restarting of pcp main process, it reads the
> pids from the shared memory into its local memory. When child process
> is forked, its pid is added to the shared memory array. When child
> process exits, its pid in the array is cleared to 0, representing an
> empty slot.
>
> Attached is a patch to implement it.
In the patch there were duplicate for loops in
pcp_child.c:reaper(). Attached v2 patch removes it. Also update
copright year of pcp_child.c.
Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Fix-pcp-main-process-to-remember-child-pids-upon-.patch | application/octet-stream | 5.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2026-04-30 10:15:09 | 120.memory_leak_extended_memqcache fails on master |
| Previous Message | Tatsuo Ishii | 2026-04-25 13:30:51 | Problem with pcp process |