Skip site navigation (1) Skip section navigation (2)

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

From: Nis Jørgensen <nis(at)superlativ(dot)dk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Normalized Tables & SELECT [was: Find "smallest common year"]
Date: 2007-09-28 08:36:52
Message-ID: fdief6$cmj$1@sea.gmane.org (view raw or flat)
Thread:
Lists: pgsql-general
Stefan Schwarzer skrev:
>>> What would you recommend for say, 500 global national statistical
>>> variables,
>>> 500 regional and 500 subregional and 500 global aggregations? Years
>>> being
>>> covered having something between 10 and 60 years for each of these
>>> variables. All available for 240 countries/territories.
>>
>> I generally approach such problems by putting the data right
>> (normalized) at the start, then munging the data into summary tables
>> to handle the problems you're seeing now.
>>
>> I find it far easier to maintain normalized tables that produced
>> non-normalized ones (for things like data warehousing) than it is to
>> maintain non-normalized tables and trying to produce normalized data
>> from that.
> 
> Ok, I do understand that.
> 
> So, instead of the earlier mentioned database design, I would have
> something like this:
> 
>    - one table for the country names/ids/etc. (Afghanistan, 1; Albania,
> 2....)
>    - one table for the variable names/ids/etc. (GDP, 1; Population, 2;
> Fish Catch, 3;....)
>    - one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3;
> ....)
> and
>    - one table for all "statistical data" with four fields -
> id_variable, id_country, id_year, and the actual value

This is one posibility. Another is to have one table for each variable.
This has the benefit of not mixing different units/data types in the
same field. It does mean you cannot use the same (parameterized) query
for getting different measures.

Since it is easy to create views converting from one to the other of
these presentations, which one you choose is not that important

Also, there is no obvious need to have a lookup table for years - just
store the year as an integer in your data table(s). If necessary, add a
constraint indicating which years are valid. You can produce rows from
missing years by left joining with generate_series(start_year, end_year)

Even if you choose to store the valid years in a table, the id_year is
unnecessary - just use the year itself as the primary key.

More in another reply.

Nis


In response to

pgsql-general by date

Next:From: Alban HertroysDate: 2007-09-28 09:04:01
Subject: Re: Normalized Tables & SELECT [was: Find "smallest common year"]
Previous:From: Alban HertroysDate: 2007-09-28 08:30:41
Subject: Re: Re: Why the ERROR: duplicate key violates unique constraint "master_pkey" is raised? - Is this a Bug?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group