Skip site navigation (1) Skip section navigation (2)

Re: Requirements for a database server

From: Ron Chmara <ron(at)Opus1(dot)COM>
To: Robert(dot)Farrugia(at)go(dot)com(dot)mt
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Requirements for a database server
Date: 2001-07-22 07:06:52
Message-ID: 01K67MQBICNI8XLZK9@Opus1.COM (view raw, whole thread or download thread mbox)
Lists: pgsql-admin
On Tuesday, July 17, 2001, at 10:52  PM, 
Robert(dot)Farrugia(at)go(dot)com(dot)mt wrote:
> Hi,
> I have been using postgres for the last year now.  The database 
> has grown
> from a mere few MBs to over 100GB data and expected to top up 
> 300GB by the
> end of the year.

Interesting. The most I've ever seen a PG data set was 87GB, but 
it was only

I'm trying to ask a question politely, which is:
"Can you improve efficiency be removing repetitive data, such as 
text  or varchar fields?"

As I read your post, I thought about a project I'm working on, 
where we took 180GB
and dropped it to 100Mb, simply because their db design stored a 
lot of data that was
repetitive. To put it another way, the english language only has 
around 20,000 words,
so even normal text fields can be optimized at some point with 
numeric keys. :-)

In their case, they were storing every text "GET" string 
recieved by their web server,
and running LIKE searches on it (no, this is not a joke.). When 
their queries went from
 > 2 hours to <1 second, they were very confused. :-)

>   Lately performance of queries, inserts, updates has
> continued to grow worse as the dataset has grown larger, even 
> though most
> queries have indexes on them, while vacuuming the database has become a
> nightmare.  For this reason we were thinking of upgrading the system
> hardware.

That's one option. RAID controllers are cheap, these days... my 
standard db
box has at *least* 128Mb of RAID cache on it, which helps... 
depending on table
design, of course. Dell makes some nice, multi-raid, systems. If 
your table design
exceeds the amount of RAID cache, you may want to redesign your 
tables, or
look into a mainframe or two.

> Is there anyone out there who has databases of this sort ?  Can 
> he point me
> out some basic requirements for these sort of databases 
> (processors, ram,
> storage, etc) ?  Also would it be worthwile to migrate to 
> another database
> system which is more robust with this kind of amount of data (such as
> oracle) ?

We test-migrated the above-mentioned project to a testing 
platform with Oracle,
and then realized the $45K oracle license could be better spent 
on a ground up
redesign of the db, and saved $20K by the end of the project.

Of course, your case may be different, but in many cases, once a 
full data
set is gathered, there can be numerous optimizations found 
within the db design
that *weren't* apparent before.

Using more efficient libraries (such as compiled C instead of 
JDBC), or optimizing
queries (avoiding the evils of table scans) can improve 
performance 1000%.
Fields which used to be "unique enough" may have 10, 50, 100 
duplicate entires
in them, all text/char/varchar, and can be ruduced to 
higher-speed int() keys.
Queries that are built on complex, multi-data fields, can be 
split to more rows, or
tables (example: If you often query on the first letter in a 
name, make that an col
in the table, and query that). Compound data such as 
"email_address" can easily
be broken down even further, into username and domain tables, and a join
between the tables (so all email addresses come back, 
lightning fast).
Even basic things, like stripping off a leading alpha character 
from a product
code, can boost performance and reduce storage and throughput issues.

Of course, if you've already done all of that, you're down to 
raw performance.

Some general notes:
In most cases, SMP with postgres helps on concurrent queries, 
but doesn't do
a lot for single queries.
180/SCSI is the only way to go. IDE just doen't work in the 
best-performance cases.
Rip the OS down to the bare essentials. Don't even think of 
using a stock OS kernel,
if you don't have to.
If you have a front end for inserts, and another front end for 
queries, it may help to
reduce the front-end overhead. Often, there's a bottleneck which 
is assumed
to be the db engine, when it's really the access method.


ron(at)opus1(dot)com, 520-326-6109,
The opinions expressed in this email are not necessarily those 
of myself,
my employers, or any of the other little voices in my head.

In response to

pgsql-admin by date

Next:From: Luis SousaDate: 2001-07-23 08:14:44
Subject: Re: Operator *=
Previous:From: Tim BoringDate: 2001-07-22 03:05:12
Subject: Re: destroydb: command not found

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group