Quick Links

custom crosstab question

From:	"Michael Swierczek" <mike(dot)swierczek(at)gmail(dot)com>
To:	"PostgreSQL Novice" <pgsql-novice(at)postgresql(dot)org>
Subject:	custom crosstab question
Date:	2007-04-25 14:36:43
Message-ID:	68b5b5880704250736v41261ffg5b506ecb7978af0d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

Ladies and Gentlemen,
This will be long, I apologize. I'm mostly looking for someone to
tell me there's an obvious solution I'm missing.

I have a table 'answer' with medical survey answers. Periodically the
data must be exported to CSV files in a quasi-crosstab format so they
can be examined by statisticians.

The complication is that many of the questions in the medical surveys
can be skipped entirely (if you report not using drugs at all, we
don't ask how recently you used marijuana, and so forth...) For
analysis, the exported data needs a 'category' column for each survey
question even if it has no corresponding answer and not just the
survey questions with answers that actually exist in the database.

Here is a simplified example of the tables, showing sample data for one survey:
survey_question: (id | survey_id | question_code | question_order)
1 | 1 | 'drug' | 1
2 | 1 | 'marijuana' | 2
3 | 1 | 'sick' | 3

answer: (id | survey_event_id | question_code | answer_order |
answer_value)
1 | 1 | 'drug' | 1 | 1
2 | 1 | 'sick' | 2 | 1

I wrote a program that takes a 'survey_id' input, queries PostgreSQL
for the set of question_codes, and then generates a massive query in
the form:
--- begin huge query
select an0.survey_event_id, an0.answer_value, an1.answer_value,
an2.answer_value,
an3.answer_value .....
from answer an0
left join answer an1 where an0.survey_event_id = an1.survey_event_id
and an1.question_code = 'drug'
left join answer an2 where an0.survey_event_id = an2.survey_event_id and
an2.question_code = 'marijuana'
left join....
--- end huge query

Depending upon the survey, the resulting queries have between 10 and
250 joins on the answer_table. I run them through psql and pipe the
results to a csv file. They work, but they're pretty slow. There
are 14,300 total survey_event entries from 50 different surveys in the
test database I'm using, and exporting all of the data in this way
takes 3 hours. I'm using PostgreSQL 8.1.9 and 8.2.4 (the latter is
faster, although I don't have exact figures how much).

Thanks,
Mike

Responses

Re: custom crosstab question at 2007-04-25 15:39:24 from Joe Conway

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Alan Hodgson	2007-04-25 15:02:34	Re: moving data from windows to linux
Previous Message	Phillip Smith	2007-04-25 13:57:53	Re: delete non-unique