Re: dynamic crosstab

From: Erik Jones <erik(at)myemma(dot)com>
To: Klein Balazs <Balazs(dot)Klein(at)t-online(dot)hu>
Cc: "'Tino Wildenhain'" <tino(at)wildenhain(dot)de>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: dynamic crosstab
Date: 2008-02-14 22:26:14
Message-ID: DD75275B-A579-4D5F-92FF-7BD961FC6A9A@myemma.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Feb 14, 2008, at 3:49 PM, Klein Balazs wrote:

> My point was to get rid of the the EAV setup. Something like:
>
> CREATE TABLE questions (
> question_id serial primary key,
> question text not null
> );
>
> CREATE TABLE people (
> person_id serial primary key,
> ....
> );
>
> CREATE TABLE answers (
> person_id integer references people,
> answers text[]
> );
>
> where the indexes into answers are ids from questions. You don't get
> any easy foreign keys for those indexes into the questions table,
> which you definitely don't have with the EAV setup anyway, but with
> this you don't need any kind of pivot/crosstab functionality.
>
>> I can't imagine how I could store data directly that way (beside
>> the usual
>> thing that whenever I can I aim to store scalar value in a column).
>>
>> To do what you suggest I could have this:
>> 1 (aaa,bbb,ccc)
>> 2 (ddd,NULL,eee)
>> but for this I would need to store a NULL for a person for all the
>> questions
>> he/she didn't answer. Now answers may come from all sorts of
>> questionnaires
>> so most people will only have responses on a subset, so this does
>> not seem
>> feasible.
>>
>> Or this:
>> 1 (aaa,bbb,ccc)
>> 2 (ddd,eee)
>> but this would be loosing the purpose - there is no longer a
>> meaningful way
>> to compare the same info at different people.
>>
>> So directly storing the info in this structure does not seem to be
>> the way
>> for me. On the other hand a query may be able to generate the
>> proper array
>> without the usual problem of outputting unknown number of columns.

First, please stop top-posting. It makes it difficult for both me
and others to know to whom/what you are replying.

Now on to the meat of the topic! When using arrays you do not need
to manually store NULLS -- they are implied by gaps in array
indices. Observe:

CREATE TABLE questions (
question_id INTEGER PRIMARY KEY,
question_text TEXT NOT NULL
);

CREATE TABLE people (
person_id SERIAL PRIMARY KEY,
answers TEXT[]
);

INSERT INTO questions (question_id, question_text) VALUES (2, 'Will
arrays work?'), (5, 'Can pigs fly?');

INSERT INTO people (person_id) VALUES (1), (2);

UPDATE people
SET answers[2] = 'yep!',
answers[5] = 'nope!',
answers[7] = 'this shouldn''t be here!'
where person_id=1;

UPDATE people
SET answers[5]='if only they had wings'
where person_id=2;

SELECT * FROM people;

person_id | answers
---------------
+-------------------------------------------------------------
1 | [2:7]={yep!,NULL,NULL,nope!,NULL,"this shouldn't
be here!"}
2 | [5:5]={"if only they had wings"}

See how postgres handles filling the NULLs for you? What you'd
really want to do with this would be to define some functions for
setting and getting a person's answers to a given question or set of
questions so that you could implement some kind of data integrity
with regards to question ids and indices into the answers arrays such
as in the example above you'd want to prevent an entry at index 7
when there is no entry in the questions table for question_id=7.
This whole thing is still wide open for adding extra layers such as
question groupings for separate questionnaires, etc.

Erik Jones

DBA | Emma®
erik(at)myemma(dot)com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tony Caduto 2008-02-14 22:26:28 Re: PostgreSQLDirect versus Npgsql
Previous Message Josh Berkus 2008-02-14 22:21:51 Re: Suncoast Postgres User Group