Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2012-09-16 06:10:43
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C3828532916@szxeml509-mbs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Sunday, September 16, 2012 12:14 AM Fujii Masao wrote:
On Sat, Sep 15, 2012 at 4:26 PM, Amit kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote:
> On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>>
>> On Thursday, September 13, 2012 10:57 PM Fujii Masao
>> On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>>> On Wednesday, September 12, 2012 10:15 PM Fujii Masao
>>> On Wed, Sep 12, 2012 at 8:54 PM, <amit(dot)kapila(at)huawei(dot)com> wrote:
>>>>>>> The following bug has been logged on the website:
>
>>>>>> I would like to implement such feature for walreceiver, but there is one
>>>>>> confusion that whether to use
>>>>>> same configuration parameter(replication_timeout) for walrecevier as for
>>>>>> master or introduce a new
>>>>>> configuration parameter (receiver_replication_timeout).
>>
>>>>>I like the latter. I believe some users want to set the different
>>>>>timeout values,
>>>>>for example, in the case where the master and standby servers are placed in
>>>>>the same room, but cascaded standby is placed in other continent.
>>
>>>> Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver.
>>>> The main changes are:
>>>> 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver.
>>>> 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit the walreceiver.
>>> > This is same as walsender functionality.
>>
>>>> As this is a feature, So I am uploading the attached patch in coming CommitFest.
>>
>>>> Suggestions/Comments?
>
>>> You also need to change walsender so that it periodically sends the heartbeat
>>> message, like walreceiver does each wal_receiver_status_interval. Otherwise,
>>> walreceiver will detect the timeout wrongly whenever there is no traffic in the
>>> master.
>
>> Doesn't current keepalive message from walsender will suffice that need?

>No. Though the keepalive interval should be smaller than the timeout,
>IIRC there is
>no way to specify the keepalive interval now.

Currently AFAICS in the code on idle system, it should send keepalive after 10s which is hardcoded value as sleeptime.
You are right that if its not configurable, and somebody configures replication_timeout as value lower than 10s then the logic will fail.

So is it okay if a new config parameter similar to wal_receiver_status_interval be added and map it directly to sleeptime in the current code.
There will be no need for any new heartbeat message, existing keepalive will sufice that purpose.

With Regards,
Amit Kapila.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message barrybrown 2012-09-16 09:01:13 BUG #7543: Invalid table alias: DELETE FROM table *
Previous Message Fujii Masao 2012-09-15 18:44:19 Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-09-16 12:10:38 Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Previous Message Tom Lane 2012-09-16 04:41:15 Re: _FORTIFY_SOURCE by default?