On Tue, Apr 27, 2010 at 11:31 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Nikhil G. Daddikar wrote:
>> I was wondering if any of you are using (or tried to use) PG+EC2/EBS on a
>> production system. Are any best-practices. Googling didn't help much. A few
>> articles I came across scared me a bit.
> There have been a couple of reports of happy users:
Been running a very busy 170+ gb OLTP postgres database on Amazon for
1.5 years now. I can't say I'm "happy" but I've made it work and
still prefer it to running downtown to a colo at 3am when something
> There are two main things to be wary of:
> 1) Physical I/O is not very good, thus how that first system used a RAID0.
Let's be clear here, physical I/O is at times *terrible*. :)
If you have a larger database, the EBS volumes are going to become a
real bottleneck. Our primary database needs 8 EBS volumes in a RAID
drive and we use slony to offload requests to two slave machines and
it still can't really keep up.
There's no way we could run this database on a single EBS volume.
I also recommend you use RAID10, not RAID0. EBS volumes fail. More
frequently, single volumes will experience *very long* periods of poor
performance. The more drives you have in your raid, the more you'll
smooth things out. However, there have been occasions where we've had
to swap out a poor performing volume for a new one and rebuild the
RAID to get things back up to speed. You can't do that with a RAID0
> 2) Reliability of EBS is terrible by database standards; I commented on this
> a bit already at
> http://archives.postgresql.org/pgsql-general/2009-06/msg00762.php The end
> result is that you must be careful about how you back your data up, with a
> continuous streaming backup via WAL shipping being the recommended approach.
> I wouldn't deploy into this environment in a situation where losing a
> minute or two of transactions in the case of a EC2/EBS failure would be
> unacceptable, because that's something that's a bit more likely to hapen
> here than on most database hardware.
Agreed. We have three WAL-shipped spares. One streams our WAL files
to a single EBS volume which we use for worst case scenario snapshot
backups. The other two are exact replicas of our primary database
(one in the west coast data center, and the other in an east coast
data center) which we have for failover.
If we ever have to worst-case-scenario restore from one of our EBS
snapshots, we're down for six hours because we'll have to stream the
data from our EBS snapshot back over to an EBS raid array. 170gb at
20mb/sec (if you're lucky) takes a LONG time. It takes 30 to 60
minutes for one of those snapshots to become "usable" once we create a
drive from it, and then we still have to bring up the database and
wait an agonizingly long time for hot data to stream back into memory.
We had to fail over to one of our spares twice in the last 1.5 years.
Not fun. Both times were due to instance failure.
It's possible to run a larger database on EC2, but it takes a lot of
work, careful planning and a thick skin.