I'm working on a web app for a quality control checklist. I already
have a table set up, but I have a hunch that our model is sub-optimal
and I could get some better performance.I'm hoping someone on this
list can help me think clearly about how to express this efficiently
Each checklist has dozens, sometimes hundreds of questions. Each
question has between 2 and 10 possible answers. Each Question is a
varchar string, and so is each answer. A completed checklist is when
all of the questions are associated with one of its possible answers
-- i.e., when one answer is chosen.
Checklists are different for different purposes, and they can change
over time. So to keep completed checklists from inadvertently changing
when we want to have changes in new checklists, we have templates.
Templates, Questions, and Answers are a mirror of Checklists,
Questions, and Answers, and represent the 'current version' of the
checklist. We don't do the checklists ourselves, but provide them for
our clients, so that is another level of grouping going on.
So the current table hierarchy looks like this
Because we don't want changes in the current template to 'go back in
time' and change completed checklists, data is copied from Templates
into Checklists when a user goes to start a new checklist.
As you can guess, this creates a lot of duplication. In
ChecklistQuestionAnswers, out of about a million answer rows, there
are only 4,000 distinct answers. Of course, TemplatesQuestionAnswers
has duplication too, but not as bad.
So what I'm think I want to do is create a versioning system for
checklist templates, so I can save on space by storing unique
questions with unique sets of answers only once. That way, instead of
duplicating text wholesale, I can just link a Checklist against a
*version* of a Template, and then a checklist set is which answer was
chosen for which question.
Here's what I've sketched out so far.
"A clients has many templates. A template has many revisions, but only
one current revision. Each revision has many questions, and each
question has many ( between 2 and 10 ) answers. Each Checklist relates
to one Template. Each checklist has a set answers that indicate the
answer select for each question in its version of the template."
Questions /* all unique question wordings */
Answers /* all unique answer wordings. */
Templates.client_id /* relates to client table. */
Templates.current_version /* this is related to
TemplateVersions /* A logical grouping of a set of questions and answers */
TemplateVersions.template_id /* relates this version to a template. */
TemplateQuestions.template_version /* relates a question to a
template version */
TemplateQuestions.question_id /* relates a unique question to this
template version */
TemplateQuestionAnswers.template_question_id /* relates this
answer to a particular template version question */
TemplateQuestionAnswers.answer_id /* relates the unique question
to a unique answer */
Checklists.template_version /* relates this question to a template
version -- associating this checklist to a client happens through this
ChecklistAnswers /* ( I might call this something other than
'Answers' since the lack of ChecklistQuestionAnswers breaks 'name
symmetry' with TemplateQuestionAnswers ) */
The rub I'm getting hung up on is guaranteeing that ChecklistAnswers
associates a proper question-and-answer pair -- the relationship that
exists in the version of the Template that its Checklist parent is
In other words, each row in ChecklistAnswers must 'mirror' a
question_id from TemplateQuestions to one child question from
TemplateQuestionAnswers, form the template_version in Checklists. I'm
trying to think of how to do this and my thinking process short
circuits here. This is really the 'deliverable' of the database -- a
completed checklist -- so all the other templates and everything is
sort of epiphenomenal or an abstraction of that. If I can't get this
working, I've missed the whole point!
This seems a *little* unwieldy, so I'm wondering if I'm making a
solution whose complexity is not worth the space-savings I might get
from implementing it.
Also note, I've simplified this a bit. There are other dimensions of
complexity, such as a category system for grouping questions for
reporting, but I don't think we need to get into that here.
"Computers are useless. They can only give you answers"
-- Pablo Picasso
pgsql-sql by date
|Next:||From: Pavel Stehule||Date: 2010-04-20 15:17:53|
|Subject: Re: How to max() make null as biggest value?|
|Previous:||From: Mario Splivalo||Date: 2010-04-20 07:02:21|
|Subject: Re: Re: CHECK constraints with plpgsql functions - check 'fires'
BEFORE data modification?|