Re: Replication server timeout patch

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Daniel Farina <drfarina(at)acm(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replication server timeout patch
Date: 2011-02-11 21:38:30
Message-ID: AANLkTi=6WrTTM3yDfJneT7nKr8RDuS4wJvu0+kO7JSrk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 11.02.2011 22:11, Robert Haas wrote:
>>
>> On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina<drfarina(at)acm(dot)org>  wrote:
>>>
>>> I split this out of the synchronous replication patch for independent
>>> review. I'm dashing out the door, so I haven't put it on the CF yet or
>>> anything, but I just wanted to get it out there...I'll be around in
>>> Not Too Long to finish any other details.
>>
>> This looks like a useful and separately committable change.
>
> Hmm, so this patch implements a watchdog, where the master disconnects the
> standby if the heartbeat from the standby stops for more than
> 'replication_[server]_timeout' seconds. The standby sends the heartbeat
> every wal_receiver_status_interval seconds.
>
> It would be nice if the master and standby could negotiate those settings.
> As the patch stands, it's easy to have a pathological configuration where
> replication_server_timeout < wal_receiver_status_interval, so that the
> master repeatedly disconnects the standby because it doesn't reply in time.
> Maybe the standby should report how often it's going to send a heartbeat,
> and master should wait for that long + some safety margin. Or maybe the
> master should tell the standby how often it should send the heartbeat?

I guess the biggest use case for that behavior would be in a case
where you have two standbys, one of which doesn't send a heartbeat and
the other of which does. Then you really can't rely on a single
timeout.

Maybe we could change the server parameter to indicate what multiple
of wal_receiver_status_interval causes a hangup, and then change the
client to notify the server what value it's using. But that gets
complicated, because the value could be changed while the standby is
running.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-02-11 22:02:48 Re: psql patch: tab-complete :variables also at buffer start
Previous Message Stephen Frost 2011-02-11 21:37:38 Re: btree_gist (was: CommitFest progress - or lack thereof)