Re: BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: "'Fujii Masao'" <masao(dot)fujii(at)gmail(dot)com>, <pgsql-bugs(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2012-10-11 10:17:11
Message-ID: 001601cda799$94dbf390$be93dab0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wednesday, October 10, 2012 9:15 PM Heikki Linnakangas wrote:
> On 04.10.2012 13:12, Amit kapila wrote:
> > Following changes are done to support replication timeout in sender as
> well as receiver:
> >
> > 1. One new configuration parameter wal_receiver_timeout is added to
> detect timeout at receiver task.
> > 2. Existing parameter replication_timeout is renamed to
> wal_sender_timeout.
>
> Ok. The other option would be to have just one GUC, I'm open to
> bikeshedding on this one. On one hand, there's no reason the timeouts
> have to the same, so it would be nice to have separate settings, but on
> the other hand, I can't imagine a case where a single setting wouldn't
> work just as well.

I think for below case, they are required to be separate:

1. M1 (Master), S1 (Standby 1), S2 (Standby 2)
2. S1 is standby for M1, and S2 is standby for S1. Basically a simple case
of cascaded replication
3. M1 and S1 are on local network but S2 is placed at geographically
different location.
(what I want to say is n/w between M1-S1 is of good speed and S1-S2 is
very slow)
4. In above case, user might want to configure different timeouts for sender
and receiver on S1.

> Attached is an updated patch. I reverted the merging of message types
> and fixed a bunch of cosmetic issues. There was one bug: in the main
> loop of walreceiver, you send the "ping" message on every wakeup after
> enough time has passed since last reception. That means that if the
> server doesn't reply promptly, you send a new ping message every 100 ms
> (NAPTIME_PER_CYCLE), until it gets a reply. Walsender had the same
> issue, but it was not quite as sever there because the naptime was
> longer. Fixed that.

Thanks.

>
> How does this look now?

The Patch is fine and test results are also fine.

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message vaclav.juza 2012-10-11 11:29:39 BUG #7598: Loss of view performance after dump/restore of the view definition
Previous Message Bo Thorbjørn Jensen 2012-10-11 10:07:36 Re: BUG #7597: exception 0xC0000005

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-10-11 11:07:03 pgsql: Refactor flex and bison make rules
Previous Message Dimitri Fontaine 2012-10-11 09:34:17 Re: Is there a good reason why PL languages do not support cstring type arguments and return values ?