Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Fujii Masao'" <masao(dot)fujii(at)gmail(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2012-09-13 04:22:08
Message-ID: 003b01cd9167$5735e020$05a1a060$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wednesday, September 12, 2012 10:15 PM Fujii Masao
On Wed, Sep 12, 2012 at 8:54 PM, <amit(dot)kapila(at)huawei(dot)com> wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference: 7534
>> Logged by: Amit Kapila
>> Email address: amit(dot)kapila(at)huawei(dot)com
>> PostgreSQL version: 9.2.0
>> Operating system: Suse 10
>> Description:
>
>> 1. Both master and standby machine are connected normally,
>> 2. then you use the command: ifconfig ip down; make the network card of
>> master and standby down,
>
>> Observation
>> master can detect connect abnormal, but the standby can't detect connect
>> abnormal and show a connected channel long time.

> What about setting keepalives_xxx libpq parameters?
>
http://www.postgresql.org/docs/devel/static/libpq-connect.html#LIBPQ-PARAMKE
YWORDS

> Keepalives are not a perfect solution for the termination of connection,
but
> it would help to a certain extent.

We have tried by enabling keepalive, but it didn't worked maybe because
walreceiver is trying to send reveiver status.
It fails in sending that after many attempts of same.

> If you need something like walreceiver-version of replication_timeout,
such feature has not been implemented yet.
> Please feel free to implement that!

I would like to implement such feature for walreceiver, but there is one
confusion that whether to use
same configuration parameter(replication_timeout) for walrecevier as for
master or introduce a new
configuration parameter (receiver_replication_timeout).

The only point in having different timeout parameters for walsender and
walreceiver is for the case of standby which
has both walsender and walreceiver to send logs to cascaded standby, in
such case somebody might want to have different timeout parameters for
walsender and walreceiver.
OTOH it will create confusion to have too many parameters. My opinion is to
have one timeout parameter for both walsender and walrecevier.

Let me know your suggestion/opinion about same.

Note- I am marking cc to pgsql-hackers, as it will be a feature request.

With Regards,
Amit Kapila.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message bugs 2012-09-13 06:39:21 BUG #7536: run arbitrary -c setup command before interaction [wishlist]
Previous Message Amit Kapila 2012-09-13 04:00:24 Re: BUG #7534: walreceiver takes long time to detect n/w breakdown

Browse pgsql-hackers by date

  From Date Subject
Next Message Francois Tigeot 2012-09-13 06:30:03 SYSV shared memory vs mmap performance
Previous Message Amit Kapila 2012-09-13 04:00:24 Re: BUG #7534: walreceiver takes long time to detect n/w breakdown