Re: 9/18 Visual Planner Meeting Wrapup

From: "Len Shapiro" <lenshap(at)gmail(dot)com>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Mark Wong" <markwkm(at)gmail(dot)com>, "Postgresql PDX_Users" <pdxpug(at)postgresql(dot)org>, "David Maier" <maier(at)cs(dot)pdx(dot)edu>
Subject: Re: 9/18 Visual Planner Meeting Wrapup
Date: 2008-10-22 16:05:41
Message-ID: c5ee9b8a0810220905w20f1407fn8885da9d456de253@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pdxpug

Jeff,

Thanks for your words of wisdom. However, I do not think they "cut
the mustard".

The difficulty with your solutions is that they do not help an
optimizer to determine the cardinality of a join. Consider the
candidate-zip code example. The optimizer needs catalog information
to determine cardinalities, and it is hoping that zip is a foreign
key. It may be the case that, in some imaginary schema, zip is a
foreign key. But that imaginary schema is not in the catalog for the
optimizer to investigate. So the optimizer is SOL. That's my $.02,
anyway.

All the best,

Len

On Tue, Oct 21, 2008 at 10:07 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Mon, 2008-10-20 at 20:38 -0700, Len Shapiro wrote:
>> My interest in the question is that if all useful joins are foreign
>> key joins then it makes things like optimizer estimates of join
>> selectivities much simpler. However, there are several useful
>> non-foreign key joins. They were provided to me by David Maier. I'll
>> provide the SQL for the first of them, and let you figure out the
>> rest:
>
> I might agree with this question (though I don't think so, I'd need more
> details to be sure):
>
>> "Who are the congressional candidates running in my district?"
>>
>
> But I disagree with the rest. The reason is that, for each of these,
> there exists some very reasonable database design in which the join does
> turn into a FK join (based on our offline conversation, you consider
> self joins to be implied FKs as well). I will present a few examples of
> designs below:
>
>> SELECT candname
>> FROM candidate JOIN resident ON candidate.zip = resident.zip
>> WHERE resident.name = 'Len Shapiro'
>
> 3 tables: person, resident, and candidate. Resident and candidate both
> have FKs to person, and so the 3-way join can be expressed using FK
> relationships.
>
>> "Which zip codes have both a Burgerville and a White Castle?"
>
> This looks either like a single table with a self join, and if not, it
> can be similarly decomposed as above.
>
>> "Who can TA each of the course offerings this year?"
>> [There can be more than one offering of the same course
>> number in a year, so course number is only a partial
>> key]
>
> 3 tables: course, course_offering, and potential_ta. I don't really know
> what the conditions here are supposed to be, but I'm pretty sure I can
> construct it using entirely FK joins.
>
> And so on. If a question presupposes that they use some particular
> database design, I think that disqualifies it from answering your
> problem.
>
> I think the only correct answer to your problem involves functional
> relationships that cannot be (sanely) materialized (as in my example,
> the functional relationship between a specific timestamp and the hour in
> which it occurs). To be useful, the function must not be one-to-one (as
> in my example), otherwise you could just join using the original value.
>
> If it relies on a functional relationship, the FK clearly cannot
> reference the result of a function (or if it can, you can simply ask a
> second question which relies on a different functional relationship),
> and thus it makes it impossible to simply alter the design to answer the
> question with FK joins.
>
> I happen to think time provides the most practical examples. For
> instance, let's say you have a bunch of events with a timestamp and
> timezone. You'd like to join based on the starting time of the events in
> local time. And let's say you'd also like to join based on GMT to answer
> a separate question. There's no design that can answer both those
> questions without relying on a function that can't be materialized into
> a relationship table. (To give these questions meaning, the first one
> might be "how did the turnout for these events compare with others
> happening at the same local time" and the second question might be "how
> did international news coverage of these events compare with others that
> happened at the same GMT time").
>
> Time truncation (i.e. getting the hour when you have microsecond
> resolution) provides more practical examples, although someone could
> plausibly break the timestamp into it's various fields. That's why you
> need to have a separate question that adds complexity (e.g. date math or
> timezones) to prevent that from being a realistic design (to use
> timezones you'd need to use the + operator, which can't be part of a FK
> relationship).
>
> Other practical examples exist, however. Salary ranges is a simple one:
> join people who make within $5k of each other, for instance. There are a
> million similar questions relating to demographics.
>
> Regards,
> Jeff Davis
>
>

In response to

Responses

Browse pdxpug by date

  From Date Subject
Next Message Jeff Davis 2008-10-22 16:52:01 Re: 9/18 Visual Planner Meeting Wrapup
Previous Message Jeff Davis 2008-10-21 17:07:08 Re: 9/18 Visual Planner Meeting Wrapup