Re: RFE: Column aliases in WHERE clauses

From: Rafal Pietrak <rafal(at)zorro(dot)isa-geek(dot)com>
To: Mike Christensen <mike(at)kitchenpc(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: RFE: Column aliases in WHERE clauses
Date: 2012-09-18 08:29:36
Message-ID: 1347956976.21898.19.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 2012-09-17 at 16:44 -0700, Mike Christensen wrote:
> On Mon, Sep 17, 2012 at 4:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Mike Christensen <mike(at)kitchenpc(dot)com> writes:

[-----------]

> > It's easier to understand why this is if you realize that SQL has a very
> > clear model of a "pipeline" of query execution. Conceptually, what
> > happens is:
> >
> > 1. Form the cartesian product of the tables listed in FROM (ie, all
> > combinations of rows).
> >
> > 2. Apply the WHERE condition to each row from 1, and drop rows that
> > don't pass it.
> >
> > 3. If there's a GROUP BY, merge the surviving rows into groups.
> >
> > 4. If there's aggregate functions, compute those over the rows in
> > each group.
> >
> > 5. If there's a HAVING, filter the grouped rows according to that.
> >
> > 6. Evaluate the SELECT expressions for each remaining row.
> >
> > 7. If there's an ORDER BY, evaluate those expressions and sort the
> > remaining rows accordingly.
> >
> > (Obviously, implementations try to improve on this - you don't want
> > to actually form the cartesian product - but that's the conceptual
> > model.)
> >
> > The traditional shortcut of doing "ORDER BY select-column-reference"
> > is okay according to this world view, because the SELECT expressions

[--------------]

> > are already available when ORDER BY needs them. However, it's not
> > sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY,
> > because those steps precede the evaluation of the SELECT expressions.
> >
> > This isn't just academic nit-picking either, because the SELECT
> > expressions might not be valid for rows that don't pass WHERE etc.
> > Consider
> > SELECT 1/x AS inverse FROM data WHERE x <> 0;
> > The implementation *must* apply WHERE before computing the SELECT
> > expressions, or it'll get zero-divide failures that should not happen.

[-----------------]

>
> Excellent information, Tom! I've been somewhat curious on this
> behavior for some time now, and it's great to get a detailed answer..
>

Yes.

But it puzzles me, if it *conceptually* would be a signifficant
misstake, when what Tom calls "select-column-reference" (I understand
as: the colunm name introduced on the select-list), would actually be
regarded by the SQL parser as "macro-definition". Just to place the
*string*, defined at select-list-level by "AS <name>", to wherever it's
used in the WHERE/ORDER/GROUP or HAVING clauses at earlier then
select-list-evaluation processing stage.

Actual Tom's example(1):
SELECT 1/x AS inverse FROM data WHERE x <> 0;
extended to (2):
SELECT 1/x AS inverse FROM data WHERE x <> 0 AND 1/x > 20;
could be written by user as (3):
SELECT 1/x AS inverse FROM data WHERE x <> 0 AND inverse > 20;
but token/replaced to its form (2) before WHERE evaluation.

-R

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2012-09-18 09:10:54 Data recovery after inadvertent update?
Previous Message Herouth Maoz 2012-09-18 08:13:39 Re: Index creation takes more time?