Re: Replication server timeout patch

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication server timeout patch
Date: 2011-03-16 07:49:29
Message-ID: AANLkTik3-GETvakKDTwNXC3OVUr+w3DFMiriG2aiTguy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 12, 2011 at 4:34 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> I think we should consider making this change for 9.1.  This is a real
>>> wart, and it's going to become even more of a problem with sync rep, I
>>> think.
>>
>> Yeah, that's a welcome! Please feel free to review the patch.
>
> I discussed this with Heikki on IM.
>
> I think we should rip all the GUC change stuff out of this patch and
> just decree that if you set a timeout, you get a timeout.  If you set
> this inconsistently with wal_receiver_status_interval, then you'll get
> lots of disconnects.  But that's your problem.  This may seem a little
> unfriendly, but the logic in here is quite complex and still isn't
> going to really provide that much protection against bad
> configurations.  The only realistic alternative I see is to define
> replication_timeout as a multiple of the client's
> wal_receiver_status_interval, but that seems quite annoyingly
> unfriendly.  A single replication_timeout that applies to all slaves
> doesn't cover every configuration someone might want, but it's simple
> and easy to understand and should cover 95% of cases.  If we find that
> it's really necessary to be able to customize it further, then we
> might go the route of adding the much-discussed standby registration
> stuff, where there's a separate config file or system table where you
> can stipulate that when a walsender with application_name=foo
> connects, you want it to get wal_receiver_status_interval=$FOO.  But I
> think that complexity can certainly wait until 9.2 or later.
>
> I also think that the default for replication_timeout should not be 0.
>  Something like 60s seems about right.  That way, if you just use the
> default settings, you'll get pretty sane behavior - a connectivity
> hiccup that lasts more than a minute will bounce the client.  We've
> already gotten reports of people who thought they were replicating
> when they really weren't, and had to fiddle with settings and struggle
> to try to make it robust.  This should make things a lot nicer for
> people out of the box, but it won't if it's disabled out of the box.
>
> On another note, there doesn't appear to be any need to change the
> return value of WaitLatchOrSocket().

Agreed. I'll change the patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-03-16 08:41:12 Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,
Previous Message Fujii Masao 2011-03-16 07:36:28 Re: How should the waiting backends behave in sync rep?