Primary node detection race at clean startup

From: Emond Papegaaij <emond(dot)papegaaij(at)gmail(dot)com>
To: pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject: Primary node detection race at clean startup
Date: 2026-05-12 08:38:08
Message-ID: CAGXsc+ZmBoLs3Mz=G-Bdm4JJG+fH1NpHfR3qVJVwW4eBKWwStQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-hackers

Hi,

In our tests, we've found an issue that can cause all Pgpool nodes to
report an incorrect 'Role: standby':
Role : standby ← stale, never updated on this node
Backend Role : primary ← actual SR-check result

This can happen if all nodes in a watchdog cluster start with a clean
state at the same time. If the first node is still trying to determine
the primary database, it's primary_node_id is -2. This value is then
synced to other nodes in the cluster, causing all nodes to report the
stale state indefinitely. Attached is a patch against 4.7 that should
fix this.

Note that this analysis was done by Claude Code and it also created
the patch. The failure on our CI was real though and I think the
explanation makes sense.

Best regards,
Emond Papegaaij

Attachment Content-Type Size
pgpool-keep-local-primary-when-leader-initial.patch text/x-patch 3.1 KB

Responses

Browse pgpool-hackers by date

  From Date Subject
Next Message Emond Papegaaij 2026-05-12 10:49:31 Re: Primary node detection race at clean startup
Previous Message Tatsuo Ishii 2026-04-30 23:35:29 Re: 120.memory_leak_extended_memqcache fails on master