Quick Links

Primary node detection race at clean startup

From:	Emond Papegaaij <emond(dot)papegaaij(at)gmail(dot)com>
To:	pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject:	Primary node detection race at clean startup
Date:	2026-05-12 08:38:08
Message-ID:	CAGXsc+ZmBoLs3Mz=G-Bdm4JJG+fH1NpHfR3qVJVwW4eBKWwStQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgpool-hackers

Hi,

In our tests, we've found an issue that can cause all Pgpool nodes to
report an incorrect 'Role: standby':
Role : standby ← stale, never updated on this node
Backend Role : primary ← actual SR-check result

This can happen if all nodes in a watchdog cluster start with a clean
state at the same time. If the first node is still trying to determine
the primary database, it's primary_node_id is -2. This value is then
synced to other nodes in the cluster, causing all nodes to report the
stale state indefinitely. Attached is a patch against 4.7 that should
fix this.

Note that this analysis was done by Claude Code and it also created
the patch. The failure on our CI was real though and I think the
explanation makes sense.

Best regards,
Emond Papegaaij

Attachment	Content-Type	Size
pgpool-keep-local-primary-when-leader-initial.patch	text/x-patch	3.1 KB

Responses

Re: Primary node detection race at clean startup at 2026-05-12 10:49:31 from Emond Papegaaij
Re: Primary node detection race at clean startup at 2026-05-19 12:40:37 from Tatsuo Ishii

Browse pgpool-hackers by date

	From	Date	Subject
Next Message	Emond Papegaaij	2026-05-12 10:49:31	Re: Primary node detection race at clean startup
Previous Message	Tatsuo Ishii	2026-04-30 23:35:29	Re: 120.memory_leak_extended_memqcache fails on master