| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
|---|---|
| To: | emond(dot)papegaaij(at)gmail(dot)com |
| Cc: | pgpool-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Primary node detection race at clean startup |
| Date: | 2026-05-19 12:40:37 |
| Message-ID: | 20260519.214037.579991005061650329.ishii@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgpool-hackers |
Hi Emond,
> Hi,
>
> In our tests, we've found an issue that can cause all Pgpool nodes to
> report an incorrect 'Role: standby':
> Role : standby ← stale, never updated on this node
> Backend Role : primary ← actual SR-check result
>
> This can happen if all nodes in a watchdog cluster start with a clean
> state at the same time. If the first node is still trying to determine
> the primary database, it's primary_node_id is -2. This value is then
> synced to other nodes in the cluster, causing all nodes to report the
> stale state indefinitely. Attached is a patch against 4.7 that should
> fix this.
>
> Note that this analysis was done by Claude Code and it also created
> the patch. The failure on our CI was real though and I think the
> explanation makes sense.
I have looked into the patch. Although I failed to reproduce the
issue, I agree with you: the explanation makes sense. Also I have run
the regression test and all test passed. I am going to push the patch
to all supported branches.
Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2026-05-19 12:55:52 | Re: Primary node detection race at clean startup |
| Previous Message | Tatsuo Ishii | 2026-05-18 10:11:05 | Re: Proposal: Recent mutated table tracking in memory |