Re: 9/18 Visual Planner Meeting Wrapup

From: "Len Shapiro" <lenshap(at)gmail(dot)com>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Postgresql PDX_Users" <pdxpug(at)postgresql(dot)org>, "David Maier" <maier(at)cs(dot)pdx(dot)edu>, "Mark Wong" <markwkm(at)gmail(dot)com>
Subject: Re: 9/18 Visual Planner Meeting Wrapup
Date: 2008-11-10 23:52:15
Message-ID: c5ee9b8a0811101552w293267ffo10a7be3cf6701646@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pdxpug

Jeff and I have agreed that, in an attempt to avoid spam, this
will be the last email in our conversation that we will copy to
others. If you'd like to be cced on future emails in this
conversation, please contact me and I will include you.

I think it will help to clarify this conversation if I give some
more details of my motivation for wanting to be able to claim that
most joins are what I call foreign key (FK) joins. I agree now that
there are several meaningful joins that are not FK joins but I think I
can still say that almost all joins are FK joins.
Here's what I mean by "FK join": L join R on joining attributes
L(l) and R(r), where values of l normally exist in the column r and
the column r is unique. It is not necessary that every value of l
exist in R(r), in any particular instance. If every value of l exists
in R(r) in every instance, then an FK constraint holds.
Optimization is a difficult task, and estimating the size of
operator results is one of the most difficult parts of optimization.
Join size estimation is particularly difficult - existing formulas
make strong assumptions which are often violated. However, if a join
is an FK join as described above, it is possible to estimate the size
of the join quite accurately. The first case is when the constraint
holds. In the case the size of the join of L join R is exactly L. If
the constraint does not hold, the optimizer can sample the extent to
which it does not hold and apply the percentage to the size estimate.
Voila, a clean and accurate size estimate for the join.
If this all makes sense, I plan to ask a student to look into the
PG planner (optimizer) to see if it's possible to integrate this idea
into it.

Perhaps now it is clearer why I am not happy with an "imaginary"
schema. It does no good if there is another table K such that L join
K is an FK join but K is not in the database. Because then the
optimizer cannot check that the target join attribute in K is unique
and K cannot be sampled.

All the best,

Len

On Wed, Oct 22, 2008 at 8:52 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Wed, 2008-10-22 at 09:05 -0700, Len Shapiro wrote:
>> The difficulty with your solutions is that they do not help an
>> optimizer to determine the cardinality of a join. Consider the
>> candidate-zip code example. The optimizer needs catalog information
>> to determine cardinalities, and it is hoping that zip is a foreign
>> key. It may be the case that, in some imaginary schema, zip is a
>> foreign key. But that imaginary schema is not in the catalog for the
>> optimizer to investigate. So the optimizer is SOL. That's my $.02,
>> anyway.
>>
>
> I don't know what you mean by imaginary schema. You asked for a join,
> which is an operation on value(s), so we'd have to find a join where
> there is no reasonable design such that the join is a FK join.
>
> And isn't it hard to determine the cardinality of any self join? That
> would mean any self join would answer your problem. In fact, it seems
> like all (or most) of the examples given can be expressed as a self join
> in some reasonable database schema.
>
> Regards,
> Jeff Davis
>
>

In response to

Responses

Browse pdxpug by date

  From Date Subject
Next Message David E. Wheeler 2008-11-10 23:56:20 Re: 9/18 Visual Planner Meeting Wrapup
Previous Message Mark Wong 2008-11-07 19:15:07 11/20 Randal + Smalltalk