Re: Reliability recommendations

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: Jeremy Haile <jhaile(at)fastmail(dot)fm>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Reliability recommendations
Date: 2006-02-15 17:19:04
Message-ID: 43F36288.3010302@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jeremy Haile wrote:
> We are a small company looking to put together the most cost effective
> solution for our production database environment. Currently in
> production Postgres 8.1 is running on this machine:
>
> Dell 2850
> 2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
> 4 GB DDR2 400 Mhz
> 2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
> 4 x 146 GB 10K SCSI RAID 10 (for postgres data)
> Perc4ei controller
>
> ... I sent our scenario to our sales team at Dell and they came back with
> all manner of SAN, DAS, and configuration costing as much as $50k.

Given what you've told us, a $50K machine is not appropriate.

Instead, think about a simple system with several clones of the database and a load-balancing web server, even if one machine could handle your load. If a machine goes down, the load balancer automatically switches to the other.

Look at the MTBF figures of two hypothetical machines:

Machine 1: Costs $2,000, MTBF of 2 years, takes two days to fix on average.
Machine 2: Costs $50,000, MTBF of 100 years (!), takes one hour to fix on average.

Now go out and buy three of the $2,000 machines. Use a load-balancer front end web server that can send requests round-robin fashion to a "server farm". Clone your database. In fact, clone the load-balancer too so that all three machines have all software and databases installed. Call these A, B, and C machines.

At any given time, your Machine A is your web front end, serving requests to databases on A, B and C. If B or C goes down, no problem - the system keeps running. If A goes down, you switch the IP address of B or C and make it your web front end, and you're back in business in a few minutes.

Now compare the reliability -- in order for this system to be disabled, you'd have to have ALL THREE computers fail at the same time. With the MTBF and repair time of two days, each machine has a 99.726% uptime. The "MTBF", that is, the expected time until all three machines will fail simultaneously, is well over 100,000 years! Of course, this is silly, machines don't last that long, but it illustrates the point: Redundancy is beats reliability (which is why RAID is so useful).

All for $6,000.

Craig

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Lewis 2006-02-15 17:32:16 Re: Reliability recommendations
Previous Message martial.bizel 2006-02-15 17:18:21 Re: out of memory