Race condition in pcp_node_info can cause it to hang

From: Emond Papegaaij <emond(dot)papegaaij(at)gmail(dot)com>
To: pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject: Race condition in pcp_node_info can cause it to hang
Date: 2026-06-04 13:00:38
Message-ID: CAGXsc+ZhGjwm+F42Xmt8Qn1qP_h7woipiV0WsY-e-P7W3ZG2OA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-hackers

Hi,

We've hit another very rare flake in our tests, which can cause
pcp_node_info to hang indefinitely. I've analyzed the problem with
Claude Code, and it came to the conclusion and (quite small) fix
below. Attached is a patch against 4.7.

The problem:
In inform_node_info() (src/pcp_con/pcp_worker.c), the PCP reply packet
reads bi->replication_state and bi->replication_sync_state directly
from shared memory twice: once via strlen() to compute the packet
length, and once via pcp_write() to write the payload.

The streaming-replication check worker rewrites those same
shared-memory strings without a lock (it clears them to "" then
repopulates them every check cycle and on state transitions,
src/streaming_replication/pool_worker_child.c). If the string's length
changes between the two reads, the declared wsize no longer matches
the bytes actually written, so the PCP byte stream desynchronises. The
client then blocks forever in pcp_read() waiting for bytes the server
never sends.

The fix:
Snapshot the two strings into local buffers once, right after bi =
pool_get_node_info(i),
and use the locals for both the length and the payload — so a single
packet is always
internally consistent. This matches how every other field in the
packet is already
handled.

Best regards,
Emond

Attachment Content-Type Size
pcp_node_info_hang.patch text/x-patch 2.5 KB

Responses

Browse pgpool-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2026-06-04 23:09:32 Re: Race condition in pcp_node_info can cause it to hang
Previous Message Nadav Shatz 2026-06-04 10:20:01 Re: Proposal: Recent mutated table tracking in memory