Re: Pgpool-II - Tcp session time out between standby nodes

From: Bo Peng <pengbo(at)sraoss(dot)co(dot)jp>
To: Nisrine Abdou <nisrine(dot)abdou(at)hotmail(dot)com>, "pgpool-general(at)lists(dot)postgresql(dot)org" <pgpool-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Pgpool-II - Tcp session time out between standby nodes
Date: 2025-09-17 23:18:38
Message-ID: TYWP286MB263382CB076C0BA450BB4403F217A@TYWP286MB2633.JPNP286.PROD.OUTLOOK.COM
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-general

Hi,

Pgpool-II (watchdog) periodically connects to other Pgpool-II nodes,
so watchdog process should not remain in an idle state.

In order to confirm whether the logs you provided is actually an error message output by Pgpool-II,
could you please share the full log messages?

---
Bo Peng <pengbo(at)sraoss(dot)co(dot)jp>
SRA OSS K.K.
TEL: 03-5979-2701 FAX: 03-5979-2702
Mobile: 080-7752-0749
URL: https://www.sraoss.co.jp/

________________________________________
差出人: Nisrine Abdou <nisrine(dot)abdou(at)hotmail(dot)com>
送信: 2025 年 9 月 11 日 (木曜日) 1:05
宛先: pgpool-general(at)lists(dot)postgresql(dot)org <pgpool-general(at)lists(dot)postgresql(dot)org>
件名: Pgpool-II - Tcp session time out between standby nodes

Hi all,
 
I'm new here, and i'm no network/linux expert, so please bear with me :)
 
We have an issue on our Pgpool-II cluster where tcp sessions between standby nodes are timed out on the Firewall, but not dropped on the servers.
This is caused by the fact that the system's tcp_keepalive_time parameter is greater than the timeout configured on the firewall.
Hence, the standby nodes realize that the tcp connection between them is lost only when the system sends out its keepalive probe, which is too late.
This results in the following reoccurring messages in the Pgpool-II log files:
 
LOG:  read from socket failed
DETAIL:  Connection timed out
LOG:  client socket of dns:port Linux is closed
LOG:  new outbound connection to dns:port
 
For info, it's a 3-node cluster on 3 different sites.
So, when this happens on a "normal day", it has no impact on the service.
But when this occurs in the middle of a failover (after losing the Master Pgpool node for instance) during the election of the new Master, we end up in a split-brain situation, caused by the lost connection between the 2 standby nodes.
The cluster then shuts down since the Quorum is no longer met. 
 
So, my questions are:
1- is there any way to maintain the client socket active and alive between the standby nodes?
2- is there a tcp_keepalive configuration on Pgpool-II side? Or should we modify the system's default configuration (which is now tcp_keepalive_time = 7200)?
3- Could you please give your insight on the impacts if we modify the tcp_keepalive system parameters (tcp_keepalive_time, tcp_keepalive_intvl and tcp_keepalive_probes) in a way that keepalive probes are sent in less than an hour time (timeout configured on the firewall is 60 mn)?
 
Please advise.
 
Best Regards,
nissabissa

In response to

Browse pgpool-general by date

  From Date Subject
Next Message zam bak 2025-09-29 14:11:22 pgpool-II 4.6 RPMs for PostgreSQL 18 on RHEL/Rocky Linux v10
Previous Message Macao Tom 2025-09-17 01:17:44 Re: About pgpool failover / switchover