Re: BUG #5851: ROHS (read only hot standby) needs to be restarted manually in somecases.

From: mark <dvlhntr(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5851: ROHS (read only hot standby) needs to be restarted manually in somecases.
Date: 2011-02-08 21:36:09
Message-ID: AANLkTi=kg4NsYFSGKzKgoMF+RXy_QD4V51L_hWpFMfT4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Jan 30, 2011 at 12:45 PM, mark <dvlhntr(at)gmail(dot)com> wrote:
>
>
>> -----Original Message-----
>> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
>> Sent: Sunday, January 30, 2011 12:19 PM
>> To: mark
>> Cc: pgsql-bugs(at)postgresql(dot)org
>> Subject: Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be
>> restarted manually in somecases.
>>
>> On Fri, Jan 28, 2011 at 1:03 PM, mark <dvlhntr(at)gmail(dot)com> wrote:
>> > When showing the setting on the slave or master all tcp_keepalive
>> settings
>> > (idle, interval and count) are showing 0;
>> >
>> > The config file shows interval and count commented out, but idle in
>> the
>> > config file is set to 2100.
>> >
>> > Possible that "show tcp_keepalive_idle;" isn't reporting accurately ?
>> (or a
>> > value that high isn't be accepted?)
>> >
>> > I have reloaded configs and still seeing 0's
>> >
>> >
>> >
>> > I assume you would suggest I turn that number down... a lot.
>>
>> Yeah, the defaults are way too long for our purposes.  The way to get
>> this set correctly, I think, is to set it in the primary_conninfo
>> stream on the slave.  You end up with something like this:
>>
>> primary_conninfo='host=blahblah user=bob keepalives_idle=XX
>> keepalives_interval=XX keepalives_count=XX'
>>
> Thanks I will try this on Monday and will report back if it fixes the
> problem. (however since I can't reproduce the issue on demand it might be a
> waiting game. Might not know for a month or so tho)
>
> -Mark
>
>
>> I'm of the opinion that we really need an application-level keepalive
>> here, but the above is certainly a lot better than nothing.

my streaming replication woes continue.

I made those changes in the recovery.conf file but I am still having
streaming replication stay broken after any sort of network
interruption until someone manaully comes along and fixes things by
restarting the standby or if it's been too long resynchronizing the
base.

I think it's a network interruption that is triggering the break down,
but I don't have anything to prove it.

wal_keep_segments are set to 250, which was supposed to give us a few
hours to fix the issue but it seems we blew through that many last
night and such when someone got around to fixing it the standby was
too far behind.

my #1 problem with this right now is I can't seem to reproduce on
demand with virtual machines in our development area.

this is the recovery.conf file, see any problems with it? maybe I
didn't do some syntax right right ?

[postgres@<redacted> data9.0]$ cat recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=<redacted> port=5432 user=postgres
keepalives_idle=30 keepalives_interval=30 keepalives_count=30'

thanks
..: Mark

p.s. looking forward to 9.1 where a standby can be started with
streaming from scratch. that sounds nice.

>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2011-02-08 22:49:06 Re: BUG #5872: Function call in SQL function executed only once
Previous Message Rodolfo Campero 2011-02-08 20:00:38 BUG #5872: Function call in SQL function executed only once