Re: BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: lizenko79(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol
Date: 2017-06-27 20:16:53
Message-ID: 20527.1498594613@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> lizenko79(at)gmail(dot)com wrote:
>> I've got the following message running PostgreSQL 9.6.3 on Solaris 11.3
>> (both latest stable).
>> getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

> It sounds like your system defines the TCP_KEEPALIVE symbol at compile
> time but the kernel doesn't know it; maybe the package was compiled in a
> system where the kernel does support that option, and you're running it
> in one that doesn't?

Actually, I find the same error in the logs for our Solaris buildfarm
members. So apparently that's been going on since day one, and we
hadn't noticed it, though I now find that it's been reported before:
https://www.postgresql.org/message-id/CAJgtxT6QL0_Gt+TkSDw=q1=YVJkT73FoSrtStcu5Hy+-SXn8rw@mail.gmail.com

Some googling turned up the tcp(7P) man page for Solaris 11:
https://docs.oracle.com/cd/E36784_01/html/E36884/tcp-7p.html#REFMAN7tcp-7p

and it says this:

SunOS supports the keep-alive mechanism described in RFC 1122. It is
enabled using the socket option SO_KEEPALIVE. When enabled, the first
keep-alive probe is sent out after a TCP is idle for two hours. If the
peer does not respond to the probe within eight minutes, the TCP
connection is aborted. You can alter the interval for sending out the
first probe using the socket option TCP_KEEPALIVE_THRESHOLD. The option
value is an unsigned integer in milliseconds. The system default is
controlled by the TCP ndd parameter tcp_keepalive_interval. The minimum
value is ten seconds. The maximum is ten days, while the default is two
hours. If you receive no response to the probe, you can use the
TCP_KEEPALIVE_ABORT_THRESHOLD socket option to change the time threshold
for aborting a TCP connection. The option value is an unsigned integer
in milliseconds. The value zero indicates that TCP should never time out
and abort the connection when probing. The system default is controlled
by the TCP ndd parameter tcp_keepalive_abort_interval. The default is
eight minutes.

So apparently, Linux's TCP_KEEPIDLE corresponds to Solaris'
TCP_KEEPALIVE_THRESHOLD. TCP_KEEPINTVL and TCP_KEEPCNT seem to have no
direct equivalent, although TCP_KEEPALIVE_ABORT_THRESHOLD would correspond
to their product.

I suggest that we ought to expand the keepalive code to know about this
synonym.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-06-27 22:26:41 Re: BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol
Previous Message Martin Garton 2017-06-27 19:52:49 Re: BUG #14719: Logical replication unexpected behaviour when target table has missing columns