Table and View Planning for Census Data

From: "Lee Hachadoorian" <lee(dot)hachadoorian(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Table and View Planning for Census Data
Date: 2008-05-28 18:28:05
Message-ID: 5ab13580805281128j60a29102vaf023288be8d184d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

I have a question about how to efficiently set up tables and views to
work with census data. The census releases data partitioned (rows) by
state and segmented (columns) by arbitrary groupings of matrices, or
"subject tables". For example matrices P1 (total population), P2
(urban and rural), P3 (population by race), etc are all grouped in
segment 1. Because the census data set is huge, I am downloading and
importing data as needed, rather than all at once. Currently I am the
only user, although it could possibly be used by several coworkers in
the future. It's highly unlikely that more than a few people would be
hitting the database at the same time.

1) Often I will be querying for a specific geographic level (e.g.
counties) across state lines (e.g New York metro area, which includes
data from three states). How much would I gain from trying to set up
partitioning, as opposed to combining the states into one big table?
Or, since I would rarely be returning multiple geographic levels (like
census tracts *and* counties) at the same time, should I combine the
states but partition by geographic level?

2) The segments are arbitrary groupings, so I don't really want to
have to write a query for data from matrix P3 where I have to also
know that matrix P3 is in database table segment1. I can think of
three ways to deal with this. Please let me know which one is best,
or suggest something else I haven't thought of.
a) Create a view which combines all columns, then query for desired
columns. (But does a view have the same row size limitation as a
table? If so this wouldn't work, because the number of columns will
be in the thousands.)
b) Create a view for each matrix. Then I can query by matrix and
column (which is easier for my brain).
c) Break each matrix out into its own database table. (Most work,
how much of a performance gain will there be?)

Any input would be be appreciated.

Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center

Browse pgsql-novice by date

  From Date Subject
Next Message G. J. Walsh 2008-05-28 21:24:46 Re Unable to recover tables
Previous Message A. Kretschmer 2008-05-27 10:16:35 Re: query very slow but table very small