Re: Large Database Performance suggestions

From: "Scott Marlowe" <smarlowe(at)qwest(dot)net>
To: "Joshua Marsh" <icub3d(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Large Database Performance suggestions
Date: 2004-10-22 05:03:34
Message-ID: 1098421413.21035.78.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 2004-10-21 at 21:14, Joshua Marsh wrote:
> Hello everyone,
>
> I am currently working on a data project that uses PostgreSQL
> extensively to store, manage and maintain the data. We haven't had
> any problems regarding database size until recently. The three major
> tables we use never get bigger than 10 million records. With this
> size, we can do things like storing the indexes or even the tables in
> memory to allow faster access.
>
> Recently, we have found customers who are wanting to use our service
> with data files between 100 million and 300 million records. At that
> size, each of the three major tables will hold between 150 million and
> 700 million records. At this size, I can't expect it to run queries
> in 10-15 seconds (what we can do with 10 million records), but would
> prefer to keep them all under a minute.
>
> We did some original testing and with a server with 8GB or RAM and
> found we can do operations on data file up to 50 million fairly well,
> but performance drop dramatically after that. Does anyone have any
> suggestions on a good way to improve performance for these extra large
> tables? Things that have come to mind are Replication and Beowulf
> clusters, but from what I have recently studied, these don't do so wel
> with singular processes. We will have parallel process running, but
> it's more important that the speed of each process be faster than
> several parallel processes at once.

I'd assume that what's happening is that up to a certain data set size,
it all fits in memory, and you're going from CPU/memory bandwidth
limited to I/O limited. If this is the case, then a faster storage
subsystem is the only real answer. If the database is mostly read, then
a large RAID5 or RAID 1+0 array should help quite a bit.

You might wanna post some explain analyze of the queries that are going
slower at some point in size, along with schema for those tables etc...

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Cramer 2004-10-22 11:38:49 Re: Large Database Performance suggestions
Previous Message Tom Lane 2004-10-22 04:12:44 Re: mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...