Re: [SQL] DISTINCT ON: speak now or forever hold your peace

From: Julian Scarfe <jscarfe(at)callnetuk(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: [SQL] DISTINCT ON: speak now or forever hold your peace
Date: 2000-01-25 13:23:58
Message-ID: B4B3546E.5A80%jscarfe@callnetuk.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

on 25/1/00 10:59, Peter Eisentraut at e99re41(at)DoCS(dot)UU(dot)SE wrote:

> On Tue, 25 Jan 2000, Julian Scarfe wrote:

>> I use 'distinct on' to get the most recent reports for each of a group of
>> locations. E.g.:
>>
>> create table reports (
>> location varchar(16),
>> report_time datetime,
>> report_text text);
>>
>> select distinct on location * from reports where location ~~ 'Lond%' order by
>> location, reporttime desc;<pgsql-hackers(at)postgreSQL(dot)org>
>>
>> to get the tuples that offer the most recent reports for each of London,
>> Londonderry, Londy etc.
>>
>> Is there an alternative?
>
> select location, max(report_time) from reports group by location
>
> This also has the negligible advantage that it gives you determinate
> results.

But I want report_text *too* -- that's the important bit, and it's not an
aggregate or common column in the group. So:

> Am I wrong or can DISTINCT ON not be rewritten in terms of GROUP BY in all
> cases?

I don't think so.

It has to be something like:

select * from reports as r1 where r1.report_time = (select max(report_time)
from reports as r2 where r2.location = r1.location) and r1.location ~~
'Lond%';

However, to my surprise, that's as quick as doing the 'distinct on'.

host=> explain select * from reports as r1 where r1.report_time = (select
max(report_time) from reports as r2 where r2.location = r1.location) and
r1.location ~~ 'Lond%';
NOTICE: QUERY PLAN:

Index Scan using reports_by_location_issuetime on reports r1 (cost=186.24
size=2 width=334)
SubPlan
-> Aggregate (cost=3.62 size=0 width=0)
-> Index Scan using reports_by_location_report_time on reports r2
(cost=3.62 size=13 width=8)

EXPLAIN

host=> explain select distinct on location * from reports where location ~~
'Lond%' order by location, report_time desc;
NOTICE: QUERY PLAN:

Unique (cost=186.24 size=0 width=0)
-> Sort (cost=186.24 size=0 width=0)
-> Index Scan using reports_by_location_issuetime on reports
(cost=186.24 size=2 width=334)

I'm surprise that the "reports_by_location_issuetime" index got used, since
it involves the issuetime field which is not the same as the report_time
field. I edited it and numerous other fields from the quoted 'create table'
for simplicity. There's also a "reports_by_location_report_time" which
indexes on, obviously, (location, report_time). Comments?

Julian Scarfe, learning...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-01-25 14:14:29 Re: [HACKERS] Happy column dropping
Previous Message Peter Eisentraut 2000-01-25 11:01:22 Re: [HACKERS] Well, then you keep your darn columns

Browse pgsql-sql by date

  From Date Subject
Next Message Don Baccus 2000-01-25 15:41:38 Re: [HACKERS] DISTINCT ON: speak now or forever hold your peace
Previous Message Palle Girgensohn 2000-01-25 11:29:07 Re: [SQL] Problem with large tuples.