Problem with pcp process

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject: Problem with pcp process
Date: 2026-04-25 13:30:51
Message-ID: 20260425.223051.1207744844622514060.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-hackers

Koshino told me off-list that following script does not work:
-------------------------------------
pgpool_setup -n 3 --no-stop
pg_ctl -D data2 stop
while true
do
psql -p 11000 -c "show pool_nodes" test
if [ $? = 0 ];then
break;
fi
sleep 1
done
psql -p 11000 -c "show pool_nodes" test
pcp_recovery_node -p 11001 -n 2;pcp_promote_node -p 11001 -n 2 -s -g
-------------------------------------

pcp_recovery_node reports success but pcp_promote_node just hangs. I
found pcp worker process loops infinitely around line 584 in
pool_detach_node (pcp_worker.c):

while (!pcp_worker_wakeup_request)
{
struct timeval t = {1, 0};

select(0, NULL, NULL, NULL, &t);
}

pcp_worker_wakeup_request is a variable supposed to be set to 1 by
SIGUSR2 signal handler. When pgpool main finishes failover requests
from pcp, it sends SIGUSR2 to pcp main process, then it forwards to
pcp worker process, and its signal handler sets the variable to 1. To
find the process id to forward the signal, pcp main process keeps a
list of pids of forked children (pcp worker process) in its local
memory.

Upon failover, pgpool main sends a signal to pcp main process to
request restarting, and pgpool main restarts. Problem is, when pcp
main restarts, it forgets the list of pids. As a result, when pgpool
main sends SIGUSR2 to pcp main, it cannot find the pid to send the
signal to, which causes the infinite loop in pcp worker process.

To fix the problem, we could delay the restarting of pcp main until it
delivers the signal. Unfortunately this does not work, since pgpool
main waits for pcp main process to exit. Thus processing failover does
not proceed in pgpool main.

So I decided to add a new shared memory area to hold the pcp workers
pids as an array. Upon restarting of pcp main process, it reads the
pids from the shared memory into its local memory. When child process
is forked, its pid is added to the shared memory array. When child
process exits, its pid in the array is cleared to 0, representing an
empty slot.

Attached is a patch to implement it.

I also find similar issue with pgpool_setup. For example, pgpool_setup
-n 3 creates 3 PostgreSQL nodes. To create the standbys, pgpool_setup
uses pcp_recovery_node command. The first node creation is fine. But
in the second creation, pcp_recovery_node actually is timed out (5
seconds). pcp_recovery_node also has a similar loop above. However the
loop is timed out, instead of infinite looping. As a result, the
second pcp_recovery_node looks as if suceeded, just takes longer time
(5 seconds). The patch also fixed the case: now the second
pcp_recovery_node finishes quickly.

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Attachment Content-Type Size
pcp_child_sigusr2_fix.patch text/x-patch 3.6 KB

Responses

Browse pgpool-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2026-04-26 05:56:19 Re: Problem with pcp process
Previous Message Nadav Shatz 2026-04-23 14:16:24 Re: Proposal: Recent mutated table tracking in memory