| From: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> | 
|---|---|
| To: | "'Magnus Hagander'" <magnus(at)hagander(dot)net> | 
| Cc: | <pgsql-bugs(at)postgresql(dot)org> | 
| Subject: | Re: BUG #7534: walreceiver takes long time to detect n/w breakdown | 
| Date: | 2012-09-13 04:00:24 | 
| Message-ID: | 003a01cd9164$4e199930$ea4ccb90$@kapila@huawei.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs pgsql-hackers | 
On Wednesday, September 12, 2012 10:12 PM Magnus Hagander wrote:
On Wed, Sep 12, 2012 at 1:54 PM,  <amit(dot)kapila(at)huawei(dot)com> wrote:
>> The following bug has been logged on the website:
>
>> Bug reference:      7534
>> Logged by:          Amit Kapila
>> Email address:      amit(dot)kapila(at)huawei(dot)com
>> PostgreSQL version: 9.2.0
>> Operating system:   Suse 10
>> Description:
>
>> 1. Both master and standby machine are connected normally,
>> 2. then you use the command: ifconfig ip down; make the network card of
>> master and standby down,
>
>> Observation
>> master can detect connect abnormal, but the standby can't detect connect
>> abnormal and show a connected channel long time.
> The master will detect it quicker, because it will get an error when
> it tries to send something.
> But the standby should detect it either when sending the feedback
> message (what's your wal_receiver_status_interval set to?) or when
> ythe kernel does (have you configured the tcp keepalive on the slave
> somehow?)
  wal_receiver_status_interval - 10s (we have not changed this. Used as
default).
  We have tried by using tcp keepalive as well, it might not be able to
detect as receiver is anyway trying to send
  Receiver status.  
  It fails during send socket call from XLogWalRcvSendReply() after calling
the same many times as internally might be in   
  send() until the sockets internal buffer is full, it keeps accumulating
even if other side recv has not received the 
  data.
  Also in walsender, it is failing to replication_timeout parameter not due
to send failure.
  So in my opinion, the full-proof solution would be to have mechanism
(replication_timeout) similar to walsender in 
  walreceiver.
> Oh, and what do you actually mean by "long time"?
  15-20 mins.
-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2012-09-13 04:22:08 | Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown | 
| Previous Message | Jeff Davis | 2012-09-12 23:19:29 | Re: Probable bug with CreateFakeRelcacheEntry (now with reproducible test case) | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2012-09-13 04:22:08 | Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown | 
| Previous Message | Etsuro Fujita | 2012-09-13 03:54:04 | Re: Comment typo |