Re: Largest PostgreSQL 8.x DB someone is running?

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Keaton Adams" <kadams(at)mxlogic(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Largest PostgreSQL 8.x DB someone is running?
Date: 2008-09-22 05:08:03
Message-ID: dcc563d10809212208p4214a0bejf36f668b0f3b815@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Sep 21, 2008 at 10:14 PM, Keaton Adams <kadams(at)mxlogic(dot)com> wrote:
> What is the the largest PostgreSQL 8.x database that is running in a
> production environment that you are aware of? We top out at roughly 400 GB
> but have a need for a new project to go much, much larger (in the several TB
> range). I am attempting to get a feel for how large one should take a
> single PostgreSQL database, given all of the operational concerns such as
> overall performance with a thousand+ concurrent users, times/space
> requirements for backups and restores, how to upgrade to newer upcoming
> versions of the software, etc. especially since there are no parallel
> operations/features in the product.
>
> Any information you can provide would be very helpful.

First and foremost, don't think of 8.0, 8.1, 8.2, 8.3 and the soon to
be released 8.4 as being the same "main branch". They're not. Each
is a major release in its own right. If you're going to be building a
new system then start with 8.3.4 (due out this week) as it's got major
performance improvements of the previous versions that make scaling
much easier.

Next, size is only important if you're operating on the whole dataset
all the time.

If you've got 1,000 users running "update table set field=field+100
where id=3" you can get by on a lot less horsepower than if you're
running "select avg(field) from table" with no where clause. The
first update updates one row, the select hits the whole table. So
your usage patterns will matter.

Where I work we have older machines with one hard drive running pg 8.1
at work that handle 600 to 1200 connections all alive at once, with
dozens and dozens active at the same time. But they're all like the
simple update above reading, updating, inserting, and deleting single
rows for a session manager.

OTOH, we have 8 CPU machines with lots of memory and hard drives, that
can be expanded, that handle several hundred concurrent operations
which are often hitting dozens to thousands of rows. Those machines
have to be bigger to handle the load.

I fail to see how the (possibly non-)issue you mention above of
parallelism would negatively affect postgresql from handling 1000s of
active backends. Splitting a single query to multiple CPUs is quite
likely to be counterproductive in such an environment.

For backups of very large systems I'd look at either slony replication
slaves for backup, or PITR, or both.

Finally, compared to the commercial products on offer, if you had an 8
or 16 core machine and you had the licenses for all the cool stuff,
you could be looking at a yearly licensing fee well into the hundreds
of thousands of dollars. You can buy a lot of hardware to throw at a
problem for that price.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Reg Me Please 2008-09-22 05:17:36 Re: PDF Documentation for 8.3?
Previous Message Joshua D. Drake 2008-09-22 04:41:33 Re: Largest PostgreSQL 8.x DB someone is running?