Re: 9/18 Visual Planner Meeting Wrapup

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: len(at)pdx(dot)edu
Cc: Mark Wong <markwkm(at)gmail(dot)com>, Postgresql PDX_Users <pdxpug(at)postgresql(dot)org>, David Maier <maier(at)cs(dot)pdx(dot)edu>
Subject: Re: 9/18 Visual Planner Meeting Wrapup
Date: 2008-10-21 17:07:08
Message-ID: 1224608828.28882.51.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pdxpug

On Mon, 2008-10-20 at 20:38 -0700, Len Shapiro wrote:
> My interest in the question is that if all useful joins are foreign
> key joins then it makes things like optimizer estimates of join
> selectivities much simpler. However, there are several useful
> non-foreign key joins. They were provided to me by David Maier. I'll
> provide the SQL for the first of them, and let you figure out the
> rest:

I might agree with this question (though I don't think so, I'd need more
details to be sure):

> "Who are the congressional candidates running in my district?"
>

But I disagree with the rest. The reason is that, for each of these,
there exists some very reasonable database design in which the join does
turn into a FK join (based on our offline conversation, you consider
self joins to be implied FKs as well). I will present a few examples of
designs below:

> SELECT candname
> FROM candidate JOIN resident ON candidate.zip = resident.zip
> WHERE resident.name = 'Len Shapiro'

3 tables: person, resident, and candidate. Resident and candidate both
have FKs to person, and so the 3-way join can be expressed using FK
relationships.

> "Which zip codes have both a Burgerville and a White Castle?"

This looks either like a single table with a self join, and if not, it
can be similarly decomposed as above.

> "Who can TA each of the course offerings this year?"
> [There can be more than one offering of the same course
> number in a year, so course number is only a partial
> key]

3 tables: course, course_offering, and potential_ta. I don't really know
what the conditions here are supposed to be, but I'm pretty sure I can
construct it using entirely FK joins.

And so on. If a question presupposes that they use some particular
database design, I think that disqualifies it from answering your
problem.

I think the only correct answer to your problem involves functional
relationships that cannot be (sanely) materialized (as in my example,
the functional relationship between a specific timestamp and the hour in
which it occurs). To be useful, the function must not be one-to-one (as
in my example), otherwise you could just join using the original value.

If it relies on a functional relationship, the FK clearly cannot
reference the result of a function (or if it can, you can simply ask a
second question which relies on a different functional relationship),
and thus it makes it impossible to simply alter the design to answer the
question with FK joins.

I happen to think time provides the most practical examples. For
instance, let's say you have a bunch of events with a timestamp and
timezone. You'd like to join based on the starting time of the events in
local time. And let's say you'd also like to join based on GMT to answer
a separate question. There's no design that can answer both those
questions without relying on a function that can't be materialized into
a relationship table. (To give these questions meaning, the first one
might be "how did the turnout for these events compare with others
happening at the same local time" and the second question might be "how
did international news coverage of these events compare with others that
happened at the same GMT time").

Time truncation (i.e. getting the hour when you have microsecond
resolution) provides more practical examples, although someone could
plausibly break the timestamp into it's various fields. That's why you
need to have a separate question that adds complexity (e.g. date math or
timezones) to prevent that from being a realistic design (to use
timezones you'd need to use the + operator, which can't be part of a FK
relationship).

Other practical examples exist, however. Salary ranges is a simple one:
join people who make within $5k of each other, for instance. There are a
million similar questions relating to demographics.

Regards,
Jeff Davis

In response to

Responses

Browse pdxpug by date

  From Date Subject
Next Message Len Shapiro 2008-10-22 16:05:41 Re: 9/18 Visual Planner Meeting Wrapup
Previous Message Len Shapiro 2008-10-21 14:40:58 Re: 9/18 Visual Planner Meeting Wrapup