Improve DB Size / Performance with Table Refactoring

From: Anthony Presley <anthony(at)resolution(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Improve DB Size / Performance with Table Refactoring
Date: 2012-08-11 14:23:48
Message-ID: CALL+CapPBoCBCQz6Cb9PfCj6EvRLp5g5Wkiay8zBiirO+NVLjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi there!

We currently have a database table that's laid out something like this:
id int
date1 date
belongs_to date
type varchar(1)
type_fk int
start_time time
end_time time
location_fk int
department_fk int
value decimal

Where each row represents some data throughout the day (96 data points for
each 15-minute period) - and each "type_fk", department, and location can
have up to say, 3 rows for a given start / end time and date (based on the
"type").

This table has rapidly grown - we're adding about 1 - 2 million rows per
month - and almost all of our queries actually sum up the values based on
the belongs_to date and the location_id, however, for other statistics we
need to keep the values separate. The db is now more than 60% of our
database, and we want to come up with a better way to store it. (To speed
up other queries, we actually roll this table up into a daily table).

We're considering changing the structure of this table into one of the
following structures:

Option [A]:
id int
date1 date
belongs_to date
type_fk int
location_fk int
department_fk int
value_type1_0 decimal
....
value_type1_96 decimal
value_type2_0 decimal
....
value_type2_96 decimal
value_type3_0 decimal
....
value_type3_96 decimal

or, as an alternative:

Option [B]:
id int
date1 date
belongs_to date
type varchar(1)
type_fk int
location_fk int
department_fk int
value_type_0 decimal
....
value_type_96 decimal

We're having a hard time choosing between the two options. We'll
definitely partition either one by the date or belongs_to column to speed
up the queries.

Option A would mean that any given date would only have a single row, with
all three "types". However, this table would have 6+96*3 columns, and in
many cases at least 96 of those columns would be empty. More often than
not, however, at least half of the columns would be empty (most location's
aren't open all day).

Option B would only create rows if the type had data in it, but the other 6
columns would be redundant. Again, many of the columns might be empty.

... From a space / size perspective, which option is a better choice?

How does PostgreSQL handle storing empty columns?

Thanks!

--
Anthony

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2012-08-12 21:23:09 Re: Postgres 9.1.4 - high stats collector IO usage
Previous Message Matheus de Oliveira 2012-08-11 14:16:02 Re: Is drop/restore trigger transactional?