Re: Replication server timeout patch

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication server timeout patch
Date: 2011-03-11 19:34:25
Message-ID: AANLkTinWGNZundjdF5asUfFL+gWefSGvV2g0NauFcCxa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> I think we should consider making this change for 9.1.  This is a real
>> wart, and it's going to become even more of a problem with sync rep, I
>> think.
>
> Yeah, that's a welcome! Please feel free to review the patch.

I discussed this with Heikki on IM.

I think we should rip all the GUC change stuff out of this patch and
just decree that if you set a timeout, you get a timeout. If you set
this inconsistently with wal_receiver_status_interval, then you'll get
lots of disconnects. But that's your problem. This may seem a little
unfriendly, but the logic in here is quite complex and still isn't
going to really provide that much protection against bad
configurations. The only realistic alternative I see is to define
replication_timeout as a multiple of the client's
wal_receiver_status_interval, but that seems quite annoyingly
unfriendly. A single replication_timeout that applies to all slaves
doesn't cover every configuration someone might want, but it's simple
and easy to understand and should cover 95% of cases. If we find that
it's really necessary to be able to customize it further, then we
might go the route of adding the much-discussed standby registration
stuff, where there's a separate config file or system table where you
can stipulate that when a walsender with application_name=foo
connects, you want it to get wal_receiver_status_interval=$FOO. But I
think that complexity can certainly wait until 9.2 or later.

I also think that the default for replication_timeout should not be 0.
Something like 60s seems about right. That way, if you just use the
default settings, you'll get pretty sane behavior - a connectivity
hiccup that lasts more than a minute will bounce the client. We've
already gotten reports of people who thought they were replicating
when they really weren't, and had to fiddle with settings and struggle
to try to make it robust. This should make things a lot nicer for
people out of the box, but it won't if it's disabled out of the box.

On another note, there doesn't appear to be any need to change the
return value of WaitLatchOrSocket().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-03-11 19:39:06 Re: Re: [COMMITTERS] pgsql: Add missing keywords to gram.y's unreserved_keywords list.
Previous Message Alvaro Herrera 2011-03-11 19:29:38 Re: Re: [COMMITTERS] pgsql: Add missing keywords to gram.y's unreserved_keywords list.