| From: | Emond Papegaaij <emond(dot)papegaaij(at)gmail(dot)com> |
|---|---|
| To: | pgpool-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Primary node detection race at clean startup |
| Date: | 2026-05-12 08:38:08 |
| Message-ID: | CAGXsc+ZmBoLs3Mz=G-Bdm4JJG+fH1NpHfR3qVJVwW4eBKWwStQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgpool-hackers |
Hi,
In our tests, we've found an issue that can cause all Pgpool nodes to
report an incorrect 'Role: standby':
Role : standby ← stale, never updated on this node
Backend Role : primary ← actual SR-check result
This can happen if all nodes in a watchdog cluster start with a clean
state at the same time. If the first node is still trying to determine
the primary database, it's primary_node_id is -2. This value is then
synced to other nodes in the cluster, causing all nodes to report the
stale state indefinitely. Attached is a patch against 4.7 that should
fix this.
Note that this analysis was done by Claude Code and it also created
the patch. The failure on our CI was real though and I think the
explanation makes sense.
Best regards,
Emond Papegaaij
| Attachment | Content-Type | Size |
|---|---|---|
| pgpool-keep-local-primary-when-leader-initial.patch | text/x-patch | 3.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Emond Papegaaij | 2026-05-12 10:49:31 | Re: Primary node detection race at clean startup |
| Previous Message | Tatsuo Ishii | 2026-04-30 23:35:29 | Re: 120.memory_leak_extended_memqcache fails on master |