Skip site navigation (1) Skip section navigation (2)

design question: general db performance

From: shane hill <shill(at)adobe(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: design question: general db performance
Date: 2003-11-25 18:42:47
Message-ID: 3FC3A2A7.6060804@adobe.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi folks,

Disclaimer:  I am relatively new to RDBMSs, so please do not laugh at me 
too loudly,  you can laugh, just not too loudly and please do not point. :)

I am working on an Automated Installer Testing System for Adobe Systems 
and I am doing a DB redesign of the current postgres db:

1. We are testing a matrix of over 900 Acrobat installer configurations 
and we are tracking every file and registry entry that is affected by an 
installation.

2. a single file or registry entry that is affected by any test is 
stored in the db as a record.

3. a typical record is about 12 columns of string data. the data is all 
information about a file (mac or windows) or windows registry entry [ 
file or regkey name, file size,  modification date, checksum, 
permissions, owner, group, and in the case of a mac, we are getting all 
the hfs atts as well].

4. A typical test produces anywhere from 2000 - 5000 records.


Our db is getting to be a respectable size (about 10GB right now) and is 
growing slower and slower. I have been charged with making it faster and 
with a smaller footprint while retaining all of the current 
functionality.  here is one of my ideas.  Please tell me if I am crazy:

The strings that we are storing (mentioned in 3 above) are extremely 
repetitive. for example, there are a limited number of permissions for 
the files in the acrobat installer and we are storing this information 
over and over again in the tables. The same goes for filenames, registry 
key names and almost all of the data we are storing.  So it seems to me 
that to create a smaller and faster database we could assign an integer 
to each string and just store the integer representation of the string 
rather than the string itself.  Then we would just store the strings in 
a separate table one time and do join queries against the tables that 
are holding the strings and the main data tables.  for example,

a table that would hold unique permissions strings would look like

table:  perms_strs

string             |  id
---------------------
'drwxr-xr-x'   |   1
'-rw-------'    |   2
'drwxrwxr-x'  |   3
'-rw-r--r--'     |   4

then in my data I would just store 1,2,3 or 4 instead of the whole 
permissions string.

it seems to me that we would save lots of space and over time not see 
the same performance degradation.

anyways,  please tell me if this makes sense and make any other 
suggestions that you can think of.  I am just now starting this analysis 
so I cannot give specifics as to where we are seeing poor performance 
just yet.  just tell me if my concepts are correct.

thanks for your time and for suffering this email.

chao,

-Shane


In response to

Responses

pgsql-performance by date

Next:From: Jord TannerDate: 2003-11-25 18:58:28
Subject: Re: design question: general db performance
Previous:From: Tom LaneDate: 2003-11-25 17:07:49
Subject: Re: Where to start for performance problem?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group