custom crosstab question

From: "Michael Swierczek" <mike(dot)swierczek(at)gmail(dot)com>
To: "PostgreSQL Novice" <pgsql-novice(at)postgresql(dot)org>
Subject: custom crosstab question
Date: 2007-04-25 14:36:43
Message-ID: 68b5b5880704250736v41261ffg5b506ecb7978af0d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Ladies and Gentlemen,
This will be long, I apologize. I'm mostly looking for someone to
tell me there's an obvious solution I'm missing.

I have a table 'answer' with medical survey answers. Periodically the
data must be exported to CSV files in a quasi-crosstab format so they
can be examined by statisticians.

The complication is that many of the questions in the medical surveys
can be skipped entirely (if you report not using drugs at all, we
don't ask how recently you used marijuana, and so forth...) For
analysis, the exported data needs a 'category' column for each survey
question even if it has no corresponding answer and not just the
survey questions with answers that actually exist in the database.

Here is a simplified example of the tables, showing sample data for one survey:
survey_question: (id | survey_id | question_code | question_order)
1 | 1 | 'drug' | 1
2 | 1 | 'marijuana' | 2
3 | 1 | 'sick' | 3

answer: (id | survey_event_id | question_code | answer_order |
answer_value)
1 | 1 | 'drug' | 1 | 1
2 | 1 | 'sick' | 2 | 1

Using tablefunc/crosstab, the resulting export would be
(survey_event_id | 'drug' | 'sick' .... )
But I need (survey_event_id | 'drug' | 'marijuana' | 'sick' .... )

I wrote a program that takes a 'survey_id' input, queries PostgreSQL
for the set of question_codes, and then generates a massive query in
the form:
--- begin huge query
select an0.survey_event_id, an0.answer_value, an1.answer_value,
an2.answer_value,
an3.answer_value .....
from answer an0
left join answer an1 where an0.survey_event_id = an1.survey_event_id
and an1.question_code = 'drug'
left join answer an2 where an0.survey_event_id = an2.survey_event_id and
an2.question_code = 'marijuana'
left join....
--- end huge query

Depending upon the survey, the resulting queries have between 10 and
250 joins on the answer_table. I run them through psql and pipe the
results to a csv file. They work, but they're pretty slow. There
are 14,300 total survey_event entries from 50 different surveys in the
test database I'm using, and exporting all of the data in this way
takes 3 hours. I'm using PostgreSQL 8.1.9 and 8.2.4 (the latter is
faster, although I don't have exact figures how much).

Thanks,
Mike

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Alan Hodgson 2007-04-25 15:02:34 Re: moving data from windows to linux
Previous Message Phillip Smith 2007-04-25 13:57:53 Re: delete non-unique