Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace

From: Chris Bitmead <chris(at)bitmead(dot)com>
To: pgsql-hackers(at)postgreSQL(dot)org, pgsql-sql <pgsql-sql(at)postgreSQL(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace
Date: 2000-01-26 22:53:29
Message-ID: 388F7AE9.EDD10859@bitmead.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

Tom Lane wrote:
> SELECT DISTINCT ON (expression) target1, ...
> One immediate advantage of allowing an expression is that you can do
> distinct-on-two-fields in a rather klugy way, eg
> SELECT DISTINCT ON (field1 || ' ' || field2) ...

As long as we're fixing the syntax, I'm wondering if it wouldn't be more
logical to have DISTINCT ON somewhere later in the syntax. I'm wondering
if that might also avoid the need for () as a side effect. Like this
perhaps....

SELECT x, y, z FROM foo WHERE z DISTINCT ON x, y ORDER BY x, y;

> What I'd prefer to do is put in an
> error check that says "if you use both DISTINCT ON and ORDER BY, then
> the DISTINCT ON expression must be the first ORDER BY item".

Better, but still a little kludgy. What about a syntax that really
supports everything you want? Like maybe...

SELECT x, y, z FROM foo DISTINCT ON x, y DESC ORDER BY z ASC;

Distinct on now has a similar syntax to the order by clause. What this
means is, do the DISTINCT ON test by ordering DESC (so you get the
distinct item with the largest value of x, y), and then order the final
result by z ascending.

Unless I'm missing something that gives everybody what they want.

HANG ON, I've got a better idea.....

The other alternative is to make DISTINCT ON a bit like GROUP BY. So you
would have
something like

SELECT x, y, max(z) AS mmm FROM foo DISTINCT ON x, y ORDER BY mmm;

Like GROUP BY where you group by the non-aggregate fields in the SELECT
clause, you use DISTINCT ON, to mention the non-aggregate clauses in the
SELECT. The aggregate clause are used to select WHICH of the resulting
fields are selected for presentation. This would have the benefit of
being more general so you could select, say the city with the longest
name in each state...

SELECT state, zipcode, longest_string(cityname) FROM cities DISTINCT ON
state ORDER BY zipcode;

I don't know how far I'm stepping here from the SQL paradigm, but it
sure seems cool....

Chris.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Philip Warner 2000-01-26 22:56:27 Re: AW: AW: AW: [HACKERS] Some notes on optimizer cost estimates
Previous Message Fabio Ambrosanio 2000-01-26 22:39:38

Browse pgsql-sql by date

  From Date Subject
Next Message Iain.Mott 2000-01-26 23:03:40 RE: [SQL] Duplicate tuples with unique index
Previous Message Philip Warner 2000-01-26 22:23:24 Re: [HACKERS] DISTINCT ON: speak now or forever hold your peace