| From: | Emond Papegaaij <emond(dot)papegaaij(at)gmail(dot)com> |
|---|---|
| To: | pgpool-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Race condition in pcp_node_info can cause it to hang |
| Date: | 2026-06-04 13:00:38 |
| Message-ID: | CAGXsc+ZhGjwm+F42Xmt8Qn1qP_h7woipiV0WsY-e-P7W3ZG2OA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgpool-hackers |
Hi,
We've hit another very rare flake in our tests, which can cause
pcp_node_info to hang indefinitely. I've analyzed the problem with
Claude Code, and it came to the conclusion and (quite small) fix
below. Attached is a patch against 4.7.
The problem:
In inform_node_info() (src/pcp_con/pcp_worker.c), the PCP reply packet
reads bi->replication_state and bi->replication_sync_state directly
from shared memory twice: once via strlen() to compute the packet
length, and once via pcp_write() to write the payload.
The streaming-replication check worker rewrites those same
shared-memory strings without a lock (it clears them to "" then
repopulates them every check cycle and on state transitions,
src/streaming_replication/pool_worker_child.c). If the string's length
changes between the two reads, the declared wsize no longer matches
the bytes actually written, so the PCP byte stream desynchronises. The
client then blocks forever in pcp_read() waiting for bytes the server
never sends.
The fix:
Snapshot the two strings into local buffers once, right after bi =
pool_get_node_info(i),
and use the locals for both the length and the payload — so a single
packet is always
internally consistent. This matches how every other field in the
packet is already
handled.
Best regards,
Emond
| Attachment | Content-Type | Size |
|---|---|---|
| pcp_node_info_hang.patch | text/x-patch | 2.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2026-06-04 23:09:32 | Re: Race condition in pcp_node_info can cause it to hang |
| Previous Message | Nadav Shatz | 2026-06-04 10:20:01 | Re: Proposal: Recent mutated table tracking in memory |