Re: RFE: Column aliases in WHERE clauses

From: Mike Christensen <mike(at)kitchenpc(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Daniel Serodio (lists)" <daniel(dot)lists(at)mandic(dot)com(dot)br>, Ryan Kelly <rpkelly22(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: RFE: Column aliases in WHERE clauses
Date: 2012-09-17 23:44:47
Message-ID: CABs1bs261SsQSJEy33x7+6zsT+1KQJHMkodqDLCCvG_XBsMgfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Sep 17, 2012 at 4:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Mike Christensen <mike(at)kitchenpc(dot)com> writes:
>> This definitely makes sense in the context of aggregation, but I'm
>> wondering if the same argument applies in the use case originally
>> posted:
>
>> SELECT left(value, 1) as first_letter
>> FROM some_table
>> WHERE first_letter > 'a';
>
>> Obviously, you can write this as:
>
>> SELECT left(value, 1) as first_letter
>> FROM some_table
>> WHERE left(value, 1) > 'a';
>
>> This would run fine, though you'd be doing a sequential scan on the
>> entire table, getting the left most character in each value, then
>> filtering those results. This of course assumes you haven't built an
>> index on left(value, 1).
>
>> Thus, in theory the compiler *could* resolve the actual definition of
>> first_letter and substitute in that expression on the fly. I'm
>> wondering if that concept is actually disallowed by the SQL spec.
>
> Yes, it is. If you read the spec you'll find that the scope of
> visibility of names defined in the SELECT list doesn't include WHERE.
>
> It's easier to understand why this is if you realize that SQL has a very
> clear model of a "pipeline" of query execution. Conceptually, what
> happens is:
>
> 1. Form the cartesian product of the tables listed in FROM (ie, all
> combinations of rows).
>
> 2. Apply the WHERE condition to each row from 1, and drop rows that
> don't pass it.
>
> 3. If there's a GROUP BY, merge the surviving rows into groups.
>
> 4. If there's aggregate functions, compute those over the rows in
> each group.
>
> 5. If there's a HAVING, filter the grouped rows according to that.
>
> 6. Evaluate the SELECT expressions for each remaining row.
>
> 7. If there's an ORDER BY, evaluate those expressions and sort the
> remaining rows accordingly.
>
> (Obviously, implementations try to improve on this - you don't want
> to actually form the cartesian product - but that's the conceptual
> model.)
>
> The traditional shortcut of doing "ORDER BY select-column-reference"
> is okay according to this world view, because the SELECT expressions
> are already available when ORDER BY needs them. However, it's not
> sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY,
> because those steps precede the evaluation of the SELECT expressions.
>
> This isn't just academic nit-picking either, because the SELECT
> expressions might not be valid for rows that don't pass WHERE etc.
> Consider
> SELECT 1/x AS inverse FROM data WHERE x <> 0;
> The implementation *must* apply WHERE before computing the SELECT
> expressions, or it'll get zero-divide failures that should not happen.
>
> Now, having said all that, if you try it you'll find that Postgres
> does allow select column references in GROUP BY, using the model
> you propose above of copying whatever expression is in SELECT into
> GROUP BY. This is, to put it politely, a mistake that we are now
> stuck with for backwards-compatibility reasons. It's not spec compliant
> and it doesn't fit the language's conceptual model, but it's been that
> way for long enough that we're not likely to take it out. We are not,
> however, gonna introduce the same mistake elsewhere.
>
>> Obviously, it would add complexity (and compile overhead) but would be
>> somewhat handy to avoid repeating really complicated expressions.
>> Perhaps Common Table Expressions are a better way of doing this thing
>> anyhow.
>
> CTEs or sub-selects are a better answer for that. Each sub-select has
> its own instance of the conceptual pipeline.

Excellent information, Tom! I've been somewhat curious on this
behavior for some time now, and it's great to get a detailed answer..

Mike

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2012-09-18 00:19:06 Re: Index creation takes more time?
Previous Message Tom Lane 2012-09-17 23:32:45 Re: RFE: Column aliases in WHERE clauses