Skip site navigation (1) Skip section navigation (2)

Re: BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org"<pgsql-bugs(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org"<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2012-10-02 07:36:39
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C38285358F7@szxeml509-mbs (view raw, whole thread or download thread mbox)
Lists: pgsql-bugspgsql-hackers
On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
On 21.09.2012 14:18, Amit kapila wrote:
> On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote:
> On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila<amit(dot)kapila(at)huawei(dot)com>  wrote:
>>>> Approach-2 :
>>>> Provide a variable wal_send_status_interval, such that if this is 0, then
>>>> the current behavior would prevail and if its non-zero then KeepAlive
>>>> message would be send maximum after that time.
>>>> The modified code of WALSendLoop will be as follows:
> <snip>
>>>> Which way you think is better or you have any other idea to handle.
>>> I think #2 is better because it's more intuitive to a user.
>> Please find a patch attached for implementation of Approach-2.

>So let's think how this should ideally work from a user's point of view.
>I think there should be just two settings: walsender_timeout and
>walreceiver_timeout. walsender_timeout specifies how long a walsender
>will keep a connection open if it doesn't hear from the walreceiver, and
>walreceiver_timeout is the same for walreceiver. The system should
>figure out itself how often to send keepalive messages so that those
>timeouts are not reached.

By this it implies that we should remove wal_receiver_status_interval. Currently it is also used
incase of reply message of data sent by sender which contains till what point receiver has flushed. So if we remove this variable
receiver might start sending that message sonner than required. 
Is that okay behavior? 

>In walsender, after half of walsender_timeout has elapsed and we haven't
>received anything from the client, the walsender process should send a
>"ping" message to the client. Whenever the client receives a Ping, it
>replies. The walreceiver does the same; when half of walreceiver_timeout
>has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
>resets the timer in both ends, regardless of which side initiated it, so
>if e.g walsender_timeout < walreceiver_timeout, the client will never
>have to initiate a Ping message, because walsender will always reach the
>walsender_timeout/2 point first and initiate the heartbeat message.

Just to clarify, walsender should reset timer after it gets reply from receiver of the message it sent.
walreceiver should reset timer after sending reply for heartbeat message. 
Similar to above timers will be reset when receiver sent the heartbeat message.

>The Ping/Pong messages don't necessarily need to be new message types,
>we can use the message types we currently have, perhaps with an
>additional flag attached to them, to request the other side to reply

Can't we make the decision to send reply immediately based on message type, because these message types will be unique.

To clarify my understanding, 
1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be Hot Standby feedback message ('h').
2. the reply message from walreceiver side will be current reply message ('r').
3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or can use some existing message only?
    if new, do we need to send any additional information along with it, for existing messages can we use keepalive message it self as reply message but with an additional byte
    to indicate it is reply?

With Regards,
Amit Kapila.

In response to


pgsql-hackers by date

Next:From: Amit kapilaDate: 2012-10-02 07:43:50
Subject: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Previous:From: Heikki LinnakangasDate: 2012-10-02 06:37:16
Subject: Re: Installation of xpath (read xml on postgres)

pgsql-bugs by date

Next:From: Amit kapilaDate: 2012-10-02 07:43:50
Subject: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Previous:From: nicwillemseDate: 2012-10-02 04:55:42
Subject: BUG #7577: JDBC Driver - Compiled with Java 7

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group