I am currently working on a data project that uses PostgreSQL
extensively to store, manage and maintain the data. We haven't had
any problems regarding database size until recently. The three major
tables we use never get bigger than 10 million records. With this
size, we can do things like storing the indexes or even the tables in
memory to allow faster access.
Recently, we have found customers who are wanting to use our service
with data files between 100 million and 300 million records. At that
size, each of the three major tables will hold between 150 million and
700 million records. At this size, I can't expect it to run queries
in 10-15 seconds (what we can do with 10 million records), but would
prefer to keep them all under a minute.
We did some original testing and with a server with 8GB or RAM and
found we can do operations on data file up to 50 million fairly well,
but performance drop dramatically after that. Does anyone have any
suggestions on a good way to improve performance for these extra large
tables? Things that have come to mind are Replication and Beowulf
clusters, but from what I have recently studied, these don't do so wel
with singular processes. We will have parallel process running, but
it's more important that the speed of each process be faster than
several parallel processes at once.
Any help would be greatly appreciated!
P.S. Off-topic, I have a few invitations to gmail. If anyone would
like one, let me know.
pgsql-performance by date
|Next:||From: Gavin Sherry||Date: 2004-10-22 03:29:57|
|Subject: Re: Large Database Performance suggestions|
|Previous:||From: Gary Doades||Date: 2004-10-21 22:21:35|
|Subject: Re: Performance Anomalies in 7.4.5|