Re: High Availability for PostgreSQL

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: High Availability for PostgreSQL
Date: 2004-10-08 22:34:26
Message-ID: m3ekk8zvnx.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Centuries ago, Nostradamus foresaw when schooleys(at)co(dot)kern(dot)ca(dot)us ("Sharon Schooley") would write:
> We are looking for a 24/7 PostgreSQL solution. I've read some
> postings and information on various solutions including Taygeta,
> pgpool, and Mammoth PostgreSQL with Heartbeat.  If there are any
> users of these or other PostgreSQL high availability solutions out
> there, can you respond and let me know what is working for you and
> how long you have been up and running?    Our project is a high
> profile but simple application that must run 24/7.  We are a county
> government entity.  We are not currently PostgreSQL users.  Our o/s
> is Suse 9.0 Pro Server.

The issues likely have more to do with what kind of hardware you are
running than the software.

To build an HA system generally requires looking at _all_ the pieces,
to make sure they fit together well, including:

- The database itself
- Middleware
- Hardware, including such things as:
- Redundant servers
- Servers containing redundant hardware (e.g. - extra CPUs,
memory boards)
- Network appliances
- Applications using the stuff in the lower layers

Sometimes the middleware can hide hardware outages while backup
hardware "spins up"; if the applications are designed to be either
more or less forgiving of outages, that can either help or hurt.

Running a demanding application 168h/week on a set of hardware
infrastructure not designed for that will leave considerable risk of
embarrassing failures.

We've got people looking into AIX+HACMP for some applications; one
thing we discovered is that this (expensive) technology is likely to
make system reliability MUCH WORSE if it is not used properly.

What you'll need (similar to the HACMP efforts) is to have the time to
test your systems well under considerable load in order to figure out
what are the "sharp edges" so that the system 'bleeds' a little in QA,
but runs well in production.

There will always be some of this, whatever set of technologies you
pick for any complex project.

There's always some unexpected local lessons to learn.

After we ran the Slony-I replication system for about a week, we
determined that it was _vital_ to do regular maintenance ("vacuuming")
on the internal table "pg_listener" otherwise system performance would
steadily get pretty bad.

Your application will have different patterns, but there will be some
[likely small] set of vital bottlenecks, hard to discover until it is
under load. Changing technologies will merely change which
bottlenecks you hit :-).
--
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/unix.html
There's a new language called C+++. The only problem is every time
you try to compile your modem disconnects.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Micahael Long 2004-10-09 03:03:50 Re: Altering column type causes unstable server and data
Previous Message akanksha kulkarni 2004-10-08 20:17:16 HELP -- pgfsck, pg_resetxlog not working for me.