Re: Primary node detection race at clean startup

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: emond(dot)papegaaij(at)gmail(dot)com
Cc: pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Primary node detection race at clean startup
Date: 2026-05-19 12:40:37
Message-ID: 20260519.214037.579991005061650329.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-hackers

Hi Emond,

> Hi,
>
> In our tests, we've found an issue that can cause all Pgpool nodes to
> report an incorrect 'Role: standby':
> Role : standby ← stale, never updated on this node
> Backend Role : primary ← actual SR-check result
>
> This can happen if all nodes in a watchdog cluster start with a clean
> state at the same time. If the first node is still trying to determine
> the primary database, it's primary_node_id is -2. This value is then
> synced to other nodes in the cluster, causing all nodes to report the
> stale state indefinitely. Attached is a patch against 4.7 that should
> fix this.
>
> Note that this analysis was done by Claude Code and it also created
> the patch. The failure on our CI was real though and I think the
> explanation makes sense.

I have looked into the patch. Although I failed to reproduce the
issue, I agree with you: the explanation makes sense. Also I have run
the regression test and all test passed. I am going to push the patch
to all supported branches.

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgpool-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2026-05-19 12:55:52 Re: Primary node detection race at clean startup
Previous Message Tatsuo Ishii 2026-05-18 10:11:05 Re: Proposal: Recent mutated table tracking in memory