Re: Normalized Tables & SELECT [was: Find "smallest common year"]

From: Nis Jørgensen <nis(at)superlativ(dot)dk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Normalized Tables & SELECT [was: Find "smallest common year"]
Date: 2007-09-28 08:36:52
Message-ID: fdief6$cmj$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Stefan Schwarzer skrev:
>>> What would you recommend for say, 500 global national statistical
>>> variables,
>>> 500 regional and 500 subregional and 500 global aggregations? Years
>>> being
>>> covered having something between 10 and 60 years for each of these
>>> variables. All available for 240 countries/territories.
>>
>> I generally approach such problems by putting the data right
>> (normalized) at the start, then munging the data into summary tables
>> to handle the problems you're seeing now.
>>
>> I find it far easier to maintain normalized tables that produced
>> non-normalized ones (for things like data warehousing) than it is to
>> maintain non-normalized tables and trying to produce normalized data
>> from that.
>
> Ok, I do understand that.
>
> So, instead of the earlier mentioned database design, I would have
> something like this:
>
> - one table for the country names/ids/etc. (Afghanistan, 1; Albania,
> 2....)
> - one table for the variable names/ids/etc. (GDP, 1; Population, 2;
> Fish Catch, 3;....)
> - one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3;
> ....)
> and
> - one table for all "statistical data" with four fields -
> id_variable, id_country, id_year, and the actual value

This is one posibility. Another is to have one table for each variable.
This has the benefit of not mixing different units/data types in the
same field. It does mean you cannot use the same (parameterized) query
for getting different measures.

Since it is easy to create views converting from one to the other of
these presentations, which one you choose is not that important

Also, there is no obvious need to have a lookup table for years - just
store the year as an integer in your data table(s). If necessary, add a
constraint indicating which years are valid. You can produce rows from
missing years by left joining with generate_series(start_year, end_year)

Even if you choose to store the valid years in a table, the id_year is
unnecessary - just use the year itself as the primary key.

More in another reply.

Nis

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2007-09-28 09:04:01 Re: Normalized Tables & SELECT [was: Find "smallest common year"]
Previous Message Alban Hertroys 2007-09-28 08:30:41 Re: Re: Why the ERROR: duplicate key violates unique constraint "master_pkey" is raised? - Is this a Bug?