Behavior difference for walsender and walreceiver for n/w breakdown case

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Behavior difference for walsender and walreceiver for n/w breakdown case
Date: 2012-09-06 06:09:07
Message-ID: 005d01cd8bf6$206ea430$614bec90$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have observed that currently incase there is a network break between
master and standby, walsender process gets terminated immediately, however
walreceiver detects the breakage after long time.
The main reason I could see is due to replication_timeout configuration
parameter, walsender checks for replication_timeout, if there is no
communication from other side till replication_timeout time it detects it as
a condition to terminate the walsender.
However there is no such mechanism in walreceiver, it fails during send
socket call from XLogWalRcvSendReply() after calling the same many times as
internally might be in send until the sockets internal buffer is full, it
keeps accumulating even if other side recv has not received the data.

Shouldn't in walreceiver, there be a mechanism so that it can detect n/w
failure sooner?

Basic Steps to observe above behavior
1. Both master and standby machine are connected normally,
2. then you use the command: ifconfig ip down; make the network card of
master and standby down,
Observation
master can detect connect abnormal, but the standby can't detect connect
abnormal and show a connected channel long time.

With Regards,
Amit Kapila

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit kapila 2012-09-06 09:08:15 Re: [WIP PATCH] for Performance Improvement in Buffer Management
Previous Message Daniel Farina 2012-09-06 04:44:28 Re: Proof of concept: standalone backend with full FE/BE protocol