Re: Database storage bloat

From: "Tony and Bryn Reina" <reina_ga(at)hotmail(dot)com>
To: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Database storage bloat
Date: 2004-04-08 15:49:26
Message-ID: BAY8-DAV48dqvhu1NQJ0001f557@hotmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

> Well, an important question is where is that space going? It'd be
> interesting to give a breakup by the directories and then which files (and
> using the contrib/oid2name to get which table/indexes/etc they are).
>
> At least 16MB of that is probably going into the transaction log (IIRC
> that's the default size for the segments) in pg_xlog.
>

Thanks Stephan. That at least helps me narrow it down to my one problem
table: segmentvalues. It is taking up almost 50 MB and its primary key is
over 38MB. Nothing else even comes close. Here's the relevant output for
oid2name in order of size.

SIZE (KB) OID TABLENAME
48844 17296 = segmentvalues
38100 17298 = segmentvalues_pkey
1108 16642 = pg_proc_proname_args_nsp_index
852 17246 = neuralvalues
676 17248 = neuralvalues_pkey
..
..

Tom asked to see the table schema. Here's the 3 relevant tables for the
bloat:

CREATE SEQUENCE FileIndicies_fileID_seq;
CREATE TABLE FileIndicies (
fileID integer DEFAULT nextval('FileIndicies_fileID_seq') UNIQUE
NOT NULL, --PK
szFileName text NOT NULL,
szFileType text,
CreationDate date,
CreationTime time,
dtimestampresolution float4,
dtimespan float4,
szAppName text,
szFileComment text,
PRIMARY KEY (fileID),
UNIQUE (szFileName, szFileType, CreationDate, CreationTime)
);

CREATE SEQUENCE EntityFile_dbIndex_seq;
CREATE TABLE EntityFile (
EntityID integer, --PK
fileID integer REFERENCES FileIndicies (fileID),
dbIndex integer DEFAULT nextval('EntityFile_dbIndex_seq') UNIQUE NOT
NULL,
PRIMARY KEY (fileID, EntityID)
);

CREATE TABLE SegmentValues (
dbIndex integer REFERENCES EntityFile (dbIndex),
dwunitid smallint,
dwsampleindex smallint,
dtimestamp float4,
dvalue float4,
PRIMARY KEY (dbIndex, dtimestamp, dwsampleindex, dwunitid)
);

I suppose one thing the binary flat file may be doing is not including the
time stamp in table SegmentValues. Since I know the sampling rate, I can
just calculate the timestamp on the fly by the rate times the index
(assuming no time offset). That would lose a float4 field, but would add
back a smallint field to the table.

Is there any further normalization that I'm missing?

-Tony

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Stephan Szabo 2004-04-08 16:08:46 Re: [Fwd: Re: Location of a new column]
Previous Message Epps, Aaron M. 2004-04-08 15:09:46 Connecting to PostgreSQL via PgAdmin III