Disclaimer: I am relatively new to RDBMSs, so please do not laugh at me
too loudly, you can laugh, just not too loudly and please do not point. :)
I am working on an Automated Installer Testing System for Adobe Systems
and I am doing a DB redesign of the current postgres db:
1. We are testing a matrix of over 900 Acrobat installer configurations
and we are tracking every file and registry entry that is affected by an
2. a single file or registry entry that is affected by any test is
stored in the db as a record.
3. a typical record is about 12 columns of string data. the data is all
information about a file (mac or windows) or windows registry entry [
file or regkey name, file size, modification date, checksum,
permissions, owner, group, and in the case of a mac, we are getting all
the hfs atts as well].
4. A typical test produces anywhere from 2000 - 5000 records.
Our db is getting to be a respectable size (about 10GB right now) and is
growing slower and slower. I have been charged with making it faster and
with a smaller footprint while retaining all of the current
functionality. here is one of my ideas. Please tell me if I am crazy:
The strings that we are storing (mentioned in 3 above) are extremely
repetitive. for example, there are a limited number of permissions for
the files in the acrobat installer and we are storing this information
over and over again in the tables. The same goes for filenames, registry
key names and almost all of the data we are storing. So it seems to me
that to create a smaller and faster database we could assign an integer
to each string and just store the integer representation of the string
rather than the string itself. Then we would just store the strings in
a separate table one time and do join queries against the tables that
are holding the strings and the main data tables. for example,
a table that would hold unique permissions strings would look like
string | id
'drwxr-xr-x' | 1
'-rw-------' | 2
'drwxrwxr-x' | 3
'-rw-r--r--' | 4
then in my data I would just store 1,2,3 or 4 instead of the whole
it seems to me that we would save lots of space and over time not see
the same performance degradation.
anyways, please tell me if this makes sense and make any other
suggestions that you can think of. I am just now starting this analysis
so I cannot give specifics as to where we are seeing poor performance
just yet. just tell me if my concepts are correct.
thanks for your time and for suffering this email.
In response to
pgsql-performance by date
|Next:||From: Jord Tanner||Date: 2003-11-25 18:58:28|
|Subject: Re: design question: general db performance|
|Previous:||From: Tom Lane||Date: 2003-11-25 17:07:49|
|Subject: Re: Where to start for performance problem? |