Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace

From: Chris Bitmead <chris(at)bitmead(dot)com>
To: pgsql-hackers(at)postgreSQL(dot)org, pgsql-sql <pgsql-sql(at)postgreSQL(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace
Date: 2000-01-26 22:53:29
Message-ID: 388F7AE9.EDD10859@bitmead.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-sql
Tom Lane wrote:
>      SELECT DISTINCT ON (expression) target1, ...
> One immediate advantage of allowing an expression is that you can do
> distinct-on-two-fields in a rather klugy way, eg
>                 SELECT DISTINCT ON (field1 || ' ' || field2) ...

As long as we're fixing the syntax, I'm wondering if it wouldn't be more
logical to have DISTINCT ON somewhere later in the syntax. I'm wondering
if that might also avoid the need for () as a side effect. Like this
perhaps....

SELECT x, y, z FROM foo WHERE z DISTINCT ON x, y ORDER BY x, y;

> What I'd prefer to do is put in an
> error check that says "if you use both DISTINCT ON and ORDER BY, then
> the DISTINCT ON expression must be the first ORDER BY item".  

Better, but still a little kludgy. What about a syntax that really
supports everything you want? Like maybe...

SELECT x, y, z FROM foo DISTINCT ON x, y DESC ORDER BY z ASC;

Distinct on now has a similar syntax to the order by clause. What this
means is, do the DISTINCT ON test by ordering DESC (so you get the
distinct item with the largest value of x, y), and then order the final
result by z ascending.

Unless I'm missing something that gives everybody what they want.

HANG ON, I've got a better idea.....

The other alternative is to make DISTINCT ON a bit like GROUP BY. So you
would have
something like

SELECT x, y, max(z) AS mmm FROM foo DISTINCT ON x, y ORDER BY mmm;

Like GROUP BY where you group by the non-aggregate fields in the SELECT
clause, you use DISTINCT ON, to mention the non-aggregate clauses in the
SELECT. The aggregate clause are used to select WHICH of the resulting
fields are selected for presentation. This would have the benefit of
being more general so you could select, say the city with the longest
name in each state...

SELECT state, zipcode, longest_string(cityname) FROM cities DISTINCT ON
state ORDER BY zipcode;

I don't know how far I'm stepping here from the SQL paradigm, but it
sure seems cool....

Chris.

In response to

Responses

pgsql-hackers by date

Next:From: Philip WarnerDate: 2000-01-26 22:56:27
Subject: Re: AW: AW: AW: [HACKERS] Some notes on optimizer cost estimates
Previous:From: Fabio AmbrosanioDate: 2000-01-26 22:39:38
Subject:

pgsql-sql by date

Next:From: Iain.MottDate: 2000-01-26 23:03:40
Subject: RE: [SQL] Duplicate tuples with unique index
Previous:From: Philip WarnerDate: 2000-01-26 22:23:24
Subject: Re: [HACKERS] DISTINCT ON: speak now or forever hold your peace

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group