Skip site navigation (1) Skip section navigation (2)

Re: Large Database Performance suggestions

From: "Scott Marlowe" <smarlowe(at)qwest(dot)net>
To: "Joshua Marsh" <icub3d(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Large Database Performance suggestions
Date: 2004-10-22 05:03:34
Message-ID: 1098421413.21035.78.camel@localhost.localdomain (view raw or flat)
Thread:
Lists: pgsql-performance
On Thu, 2004-10-21 at 21:14, Joshua Marsh wrote:
> Hello everyone,
> 
> I am currently working on a data project that uses PostgreSQL
> extensively to store, manage and maintain the data.  We haven't had
> any problems regarding database size until recently.  The three major
> tables we use never get bigger than 10 million records.  With this
> size, we can do things like storing the indexes or even the tables in
> memory to allow faster access.
> 
> Recently, we have found customers who are wanting to use our service
> with data files between 100 million and 300 million records.  At that
> size, each of the three major tables will hold between 150 million and
> 700 million records.  At this size, I can't expect it to run queries
> in 10-15 seconds (what we can do with 10 million records), but would
> prefer to keep them all under a minute.
> 
> We did some original testing and with a server with 8GB or RAM and
> found we can do operations on data file up to 50 million fairly well,
> but performance drop dramatically after that.  Does anyone have any
> suggestions on a good way to improve performance for these extra large
> tables?  Things that have come to mind are Replication and Beowulf
> clusters, but from what I have recently studied, these don't do so wel
> with singular processes.  We will have parallel process running, but
> it's more important that the speed of each process be faster than
> several parallel processes at once.


I'd assume that what's happening is that up to a certain data set size,
it all fits in memory, and you're going from CPU/memory bandwidth
limited to I/O limited.  If this is the case, then a faster storage
subsystem is the only real answer.  If the database is mostly read, then
a large RAID5 or RAID 1+0 array should help quite a bit.

You might wanna post some explain analyze of the queries that are going
slower at some point in size, along with schema for those tables etc...


In response to

pgsql-performance by date

Next:From: Dave CramerDate: 2004-10-22 11:38:49
Subject: Re: Large Database Performance suggestions
Previous:From: Tom LaneDate: 2004-10-22 04:12:44
Subject: Re: mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group