Quick Links

design question: general db performance

From:	shane hill <shill(at)adobe(dot)com>
To:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	design question: general db performance
Date:	2003-11-25 18:42:47
Message-ID:	3FC3A2A7.6060804@adobe.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi folks,

Disclaimer: I am relatively new to RDBMSs, so please do not laugh at me
too loudly, you can laugh, just not too loudly and please do not point. :)

I am working on an Automated Installer Testing System for Adobe Systems
and I am doing a DB redesign of the current postgres db:

1. We are testing a matrix of over 900 Acrobat installer configurations
and we are tracking every file and registry entry that is affected by an
installation.

2. a single file or registry entry that is affected by any test is
stored in the db as a record.

3. a typical record is about 12 columns of string data. the data is all
information about a file (mac or windows) or windows registry entry [
file or regkey name, file size, modification date, checksum,
permissions, owner, group, and in the case of a mac, we are getting all
the hfs atts as well].

4. A typical test produces anywhere from 2000 - 5000 records.

Our db is getting to be a respectable size (about 10GB right now) and is
growing slower and slower. I have been charged with making it faster and
with a smaller footprint while retaining all of the current
functionality. here is one of my ideas. Please tell me if I am crazy:

The strings that we are storing (mentioned in 3 above) are extremely
repetitive. for example, there are a limited number of permissions for
the files in the acrobat installer and we are storing this information
over and over again in the tables. The same goes for filenames, registry
key names and almost all of the data we are storing. So it seems to me
that to create a smaller and faster database we could assign an integer
to each string and just store the integer representation of the string
rather than the string itself. Then we would just store the strings in
a separate table one time and do join queries against the tables that
are holding the strings and the main data tables. for example,

a table that would hold unique permissions strings would look like

table: perms_strs

then in my data I would just store 1,2,3 or 4 instead of the whole
permissions string.

it seems to me that we would save lots of space and over time not see
the same performance degradation.

anyways, please tell me if this makes sense and make any other
suggestions that you can think of. I am just now starting this analysis
so I cannot give specifics as to where we are seeing poor performance
just yet. just tell me if my concepts are correct.

thanks for your time and for suffering this email.

chao,

-Shane

In response to

Re: Optimize at 2003-11-24 20:56:17 from Gaetano Mendola

Responses

Re: design question: general db performance at 2003-11-25 18:58:28 from Jord Tanner
Re: design question: general db performance at 2003-11-25 19:12:47 from Josh Berkus
Re: design question: general db performance at 2003-11-25 19:23:18 from Jeff
Re: design question: general db performance at 2003-11-25 19:27:39 from Richard Huxton

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jord Tanner	2003-11-25 18:58:28	Re: design question: general db performance
Previous Message	Tom Lane	2003-11-25 17:07:49	Re: Where to start for performance problem?