Re: recovery_connections cannot start (was Re: master in standby mode croaks)

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery_connections cannot start (was Re: master in standby mode croaks)
Date: 2010-04-26 12:06:29
Message-ID: 4BD581C5.70301@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Fri, Apr 23, 2010 at 4:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> Well, I think the real hole is that turning archive_mode=on results in
>>> WAL never being deleted unless it's successfully archived.
>> Hm, good point. And at least in principle you could have SR setups
>> that don't care about having a backing WAL archive.
>>
>>> But we might be able to handle that like this:
>>> wal_mode={standby|archive|crash} # or whatever
>>> wal_segments_always=<integer> # keep this many segments always, for
>>> SR - like current wal_keep_segments
>>> wal_segments_unarchived=<integer> # keep this many unarchived
>>> segments, -1 for infinite
>>> max_wal_senders=<integer> # same as now
>>> archive_command=<string> # same as now
>>> So we always retain wal_segments_always segments, but if we have
>>> trouble with archiving we'll retain up to wal_segments_archived.
>> And when that limit is reached, what happens? Panic shutdown?
>> Silently drop unarchived data? Neither one sounds very good.
>
> Silently drop unarchived data. I agree that isn't very good, but
> think about it this way: if archive_command is failing, then our log
> shipping slave is not going to work. But letting the disk fill up on
> the primary does not make it any better. It just makes the primary
> stop working, too. Obviously, all of this stuff needs to be monitored
> or you're playing with fire, but I don't think having a safety valve
> on the primary is a stupid idea.

hmm not sure I agree - you need to monitor diskspace usage in general on
a system for obvious reasons. I think dealing with that kind of stuff is
not really in our realm. We are a relational database and we need to
guard the data, silently dropping data is imho not a good idea.
Just picture the typical scenario of maintenance during night times on
the standby done by a sysadmin with some batch jobs running on the
master just generating enough WAL to exceed the limit that will just
cause the sysadmin to call the DBA in.
In general the question really is "will people set this to something
sensible or rather to an absurdly high value just to avoid that their
replication will ever break" - I guess people will do that later in
critical environments...

Stefan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2010-04-26 12:57:23 INSERT and parentheses
Previous Message Heikki Linnakangas 2010-04-26 12:05:58 Re: recovery_connections cannot start (was Re: master in standby mode croaks)