Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Cc: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, "'Fujii Masao'"<masao(dot)fujii(at)gmail(dot)com>, <pgsql-bugs(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2012-10-09 13:04:31
Message-ID: 00ae01cda61e$9fe90290$dfbb07b0$@kapila@huawei.com (view raw or flat)
Thread:
Lists: pgsql-bugspgsql-hackers
On Tuesday, October 09, 2012 6:00 PM Robert Haas wrote:
> On Mon, Oct 8, 2012 at 10:42 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> wrote:
> > How about following:
> > 1. replication_client_timeout -- shouldn't it be client as new
> configuration
> > is for wal receiver
> > 2. replication_standby_timeout
> 
> ISTM that the client and the standby are the same thing.

Yeah same, but may be one (replication_standby_timeout) can be more easily
understandable by user.

 
> > If we introduce a new parameter for wal receiver, wouldn't
> > replication_timeout be confusing for user?
> 
> Maybe.  

> I actually don't think that I understand what problem we're
> trying to solve here.  If the connection between the master and the
> standby is lost, shouldn't the standby realize that it's no longer
> receiving keepalives from the master and terminate the connection? 

For wal receiver keepalives are also like one kind of message, so the
behavior is such that when it checks
that it doesn't receive any message, it tries to send reply/feedback message
to master after an interval of 
wal_receiver_status_interval.
So after every wal_receiver_status_interval, wal receiver sends a reply, but
still the socket send doesn't
fail. It fails only after many send calls as internally might be in send(),
until the sockets internal buffer is full, it keeps accumulating even if
other side recv has not received the data.
So that's the reason we decided to introduce a timeout parameter in wal
receiver similar to what we have currently in walsender.

> I
> thought I had tested this at some point and it was working, so either
> it's subsequently gotten broken again or the scenario you're talking
> about is different in some way that I don't currently understand.

Standby takes quite longer around 15 minutes to detect whereas master is
able to
detect quite sooner in 2-3 mins and master also mainly detects due to
timeout functionality in wal sender.

With Regards,
Amit Kapila.



In response to

pgsql-hackers by date

Next:From: Amit KapilaDate: 2012-10-09 13:42:20
Subject: Behavior for crash recovery when it detects a corrupt WAL record
Previous:From: Albe LaurenzDate: 2012-10-09 12:48:09
Subject: Re: Bad Data back Door

pgsql-bugs by date

Next:From: hrtlikDate: 2012-10-09 14:20:40
Subject: BUG #7590: Data corruption using pg_dump only with -Z parameter
Previous:From: Robert HaasDate: 2012-10-09 12:29:52
Subject: Re: [HACKERS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group