Re: Reliability recommendations

From: Mark Lewis <mark(dot)lewis(at)mir3(dot)com>
To: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
Cc: Jeremy Haile <jhaile(at)fastmail(dot)fm>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Reliability recommendations
Date: 2006-02-15 17:32:16
Message-ID: 1140024736.9076.167.camel@archimedes
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Machine 1: $2000
Machine 2: $2000
Machine 3: $2000

Knowing how to rig them together and maintain them in a fully fault-
tolerant way: priceless.

(Sorry for the off-topic post, I couldn't resist).

-- Mark Lewis

On Wed, 2006-02-15 at 09:19 -0800, Craig A. James wrote:
> Jeremy Haile wrote:
> > We are a small company looking to put together the most cost effective
> > solution for our production database environment. Currently in
> > production Postgres 8.1 is running on this machine:
> >
> > Dell 2850
> > 2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
> > 4 GB DDR2 400 Mhz
> > 2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
> > 4 x 146 GB 10K SCSI RAID 10 (for postgres data)
> > Perc4ei controller
> >
> > ... I sent our scenario to our sales team at Dell and they came back with
> > all manner of SAN, DAS, and configuration costing as much as $50k.
>
> Given what you've told us, a $50K machine is not appropriate.
>
> Instead, think about a simple system with several clones of the database and a load-balancing web server, even if one machine could handle your load. If a machine goes down, the load balancer automatically switches to the other.
>
> Look at the MTBF figures of two hypothetical machines:
>
> Machine 1: Costs $2,000, MTBF of 2 years, takes two days to fix on average.
> Machine 2: Costs $50,000, MTBF of 100 years (!), takes one hour to fix on average.
>
> Now go out and buy three of the $2,000 machines. Use a load-balancer front end web server that can send requests round-robin fashion to a "server farm". Clone your database. In fact, clone the load-balancer too so that all three machines have all software and databases installed. Call these A, B, and C machines.
>
> At any given time, your Machine A is your web front end, serving requests to databases on A, B and C. If B or C goes down, no problem - the system keeps running. If A goes down, you switch the IP address of B or C and make it your web front end, and you're back in business in a few minutes.
>
> Now compare the reliability -- in order for this system to be disabled, you'd have to have ALL THREE computers fail at the same time. With the MTBF and repair time of two days, each machine has a 99.726% uptime. The "MTBF", that is, the expected time until all three machines will fail simultaneously, is well over 100,000 years! Of course, this is silly, machines don't last that long, but it illustrates the point: Redundancy is beats reliability (which is why RAID is so useful).
>
> All for $6,000.
>
> Craig
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2006-02-15 17:38:17 Re: out of memory
Previous Message Craig A. James 2006-02-15 17:19:04 Re: Reliability recommendations