Re: MAX/MIN optimization via rewrite (plus query rewrites generally)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: MAX/MIN optimization via rewrite (plus query rewrites generally)
Date: 2004-11-11 22:46:17
Message-ID: 12475.1100213177@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Oh? How is a first() aggregate going to know what sort order you want
>> within the group?

> It would look something like

> select x,first(a),first(b) from (select x,a,b from table order by x,y) group by x

> which is equivalent to

> select DISTINCT ON (x) x,a,b from table ORDER BY x,y

No, it is not. The GROUP BY has no commitment to preserve order ---
consider for example the possibility that we implement the GROUP BY by
hashing.

> The group by can see that the subquery is already sorted by x and
> doesn't need to be resorted. In fact I believe you added the smarts to
> detect that condition in response to a user asking about precisely
> this type of scenario.

The fact that an optimization is present does not make it part of the
guaranteed semantics of the language.

Basically, first() is a broken concept in SQL. Of course DISTINCT ON
is broken too for the same reasons, but I do not see that first() is
one whit less of a kluge than DISTINCT ON.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Patrick B Kelly 2004-11-11 23:03:58 Re: multiline CSV fields
Previous Message Greg Stark 2004-11-11 22:34:43 Re: MAX/MIN optimization via rewrite (plus query rewrites generally)