Re: BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Magnus Hagander'" <magnus(at)hagander(dot)net>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2012-09-13 04:00:24
Message-ID: 003a01cd9164$4e199930$ea4ccb90$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wednesday, September 12, 2012 10:12 PM Magnus Hagander wrote:
On Wed, Sep 12, 2012 at 1:54 PM, <amit(dot)kapila(at)huawei(dot)com> wrote:
>> The following bug has been logged on the website:
>
>> Bug reference: 7534
>> Logged by: Amit Kapila
>> Email address: amit(dot)kapila(at)huawei(dot)com
>> PostgreSQL version: 9.2.0
>> Operating system: Suse 10
>> Description:
>
>> 1. Both master and standby machine are connected normally,
>> 2. then you use the command: ifconfig ip down; make the network card of
>> master and standby down,
>
>> Observation
>> master can detect connect abnormal, but the standby can't detect connect
>> abnormal and show a connected channel long time.

> The master will detect it quicker, because it will get an error when
> it tries to send something.

> But the standby should detect it either when sending the feedback
> message (what's your wal_receiver_status_interval set to?) or when
> ythe kernel does (have you configured the tcp keepalive on the slave
> somehow?)
wal_receiver_status_interval - 10s (we have not changed this. Used as
default).
We have tried by using tcp keepalive as well, it might not be able to
detect as receiver is anyway trying to send
Receiver status.
It fails during send socket call from XLogWalRcvSendReply() after calling
the same many times as internally might be in
send() until the sockets internal buffer is full, it keeps accumulating
even if other side recv has not received the
data.
Also in walsender, it is failing to replication_timeout parameter not due
to send failure.
So in my opinion, the full-proof solution would be to have mechanism
(replication_timeout) similar to walsender in
walreceiver.

> Oh, and what do you actually mean by "long time"?
15-20 mins.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2012-09-13 04:22:08 Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Previous Message Jeff Davis 2012-09-12 23:19:29 Re: Probable bug with CreateFakeRelcacheEntry (now with reproducible test case)

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-09-13 04:22:08 Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Previous Message Etsuro Fujita 2012-09-13 03:54:04 Re: Comment typo