From: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
---|---|
To: | "Dmitry Samonenko *EXTERN*" <shreddingwork(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: libpq: indefinite block on poll during network problems |
Date: | 2014-05-27 10:35:53 |
Message-ID: | A737B7A37273E048B164557ADEF4A58B17CFE53A@ntex2010i.host.magwien.gv.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Dmitry Samonenko wrote:
> I have an application which uses libpq for interaction with remote PostgreSQL 9.2.4 server. Clients
> and Server nodes are running Linux and connection is established using TCPv4. The client application
> has some small fault-tolerance features, which are activated when server related problems are
> encountered.
>
> One day some bad things happened with network layer hardware and, long story short, host with PSQL
> server got isolated. All TCP messages routed to server node were NOT delivered or acknowledged in any
> way. Client application got blocked in libpq code according to debugger.
>
> I have successfully reproduced the problem in the laboratory environment. These iptables commands
> should be run on the server node after some period of client <-> server interaction:
>
> # iptables -A OUTPUT -p tcp --sport 5432 -j DROP
> # iptables -A INPUT -p tcp --dport 5432 -j DROP
>
>
> I made a glimpse over master branch of libpq sources and some questions arose. Namely:
>
> 1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is
> that? Is setting socket timeouts considered harmful?
>
> 2. PQexec ultimately leads to PQwait, which after some function calls "lands" in pqSocketCheck and
> pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which
> leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for
> PQexec calls? Is there some implemented features, which should be used to handle situation like this?
>
> Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS
> actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles
> situation correctly - error is reported to my application. But it's just a workaround.
>
> So, this infinite poll situation looks like imperfection to me and I think it should be considered as
> a bug. Is it?
In PostgreSQL you can handle the problem of dying connections by setting the
tcp_keepalives_* parameters (see http://www.postgresql.org/docs/current/static/runtime-config-connection.html).
That should take care of the problem, right?
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | Albe Laurenz | 2014-05-27 10:44:59 | Re: Delete trigger and data integrity |
Previous Message | Yvonne Zannoun | 2014-05-27 10:25:17 | Delete trigger and data integrity |