Pgpool-II - Tcp session time out between standby nodes

From: Nisrine Abdou <nisrine(dot)abdou(at)hotmail(dot)com>
To: "pgpool-general(at)lists(dot)postgresql(dot)org" <pgpool-general(at)lists(dot)postgresql(dot)org>
Subject: Pgpool-II - Tcp session time out between standby nodes
Date: 2025-09-10 16:05:35
Message-ID: PR3P195MB1119E18F000D0B73D87095A7F80EA@PR3P195MB1119.EURP195.PROD.OUTLOOK.COM
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-general

Hi all,

I'm new here, and i'm no network/linux expert, so please bear with me :)

We have an issue on our Pgpool-II cluster where tcp sessions between standby nodes are timed out on the Firewall, but not dropped on the servers.
This is caused by the fact that the system's tcp_keepalive_time parameter is greater than the timeout configured on the firewall.
Hence, the standby nodes realize that the tcp connection between them is lost only when the system sends out its keepalive probe, which is too late.
This results in the following reoccurring messages in the Pgpool-II log files:

LOG: read from socket failed
DETAIL: Connection timed out
LOG: client socket of dns:port Linux is closed
LOG: new outbound connection to dns:port

For info, it's a 3-node cluster on 3 different sites.
So, when this happens on a "normal day", it has no impact on the service.
But when this occurs in the middle of a failover (after losing the Master Pgpool node for instance) during the election of the new Master, we end up in a split-brain situation, caused by the lost connection between the 2 standby nodes.
The cluster then shuts down since the Quorum is no longer met.

So, my questions are:
1- is there any way to maintain the client socket active and alive between the standby nodes?
2- is there a tcp_keepalive configuration on Pgpool-II side? Or should we modify the system's default configuration (which is now tcp_keepalive_time = 7200)?
3- Could you please give your insight on the impacts if we modify the tcp_keepalive system parameters (tcp_keepalive_time, tcp_keepalive_intvl and tcp_keepalive_probes) in a way that keepalive probes are sent in less than an hour time (timeout configured on the firewall is 60 mn)?

Please advise.

Best Regards,
nissabissa

Responses

Browse pgpool-general by date

  From Date Subject
Next Message Macao Tom 2025-09-12 03:43:07 About pgpool failover / switchover
Previous Message Koshino Taiki 2025-08-21 08:57:25 Pgpool-II 4.6.3, 4.5.8, 4.4.13, 4.3.16 and 4.2.23 are now officially released.