Re: Optimizing DISTINCT with LIMIT

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: tmp <skrald(at)amossen(dot)dk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Optimizing DISTINCT with LIMIT
Date: 2008-12-04 14:09:34
Message-ID: 878wqw117l.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

tmp <skrald(at)amossen(dot)dk> writes:

> If the input is "sufficiently" large and the LIMIT threshold "sufficiently"
> small, maintain the DISTINCT output by hashning while traversing the input and
> stop when the LIMIT threshold is reached. No sorting required and *at* *most*
> one read of input.

You mean like this?

postgres=# explain select distinct x from i limit 5;
QUERY PLAN
-------------------------------------------------------------------
Limit (cost=54.50..54.51 rows=1 width=304)
-> HashAggregate (cost=54.50..54.51 rows=1 width=304)
-> Seq Scan on i (cost=0.00..52.00 rows=1000 width=304)
(3 rows)

This will be in the upcoming 8.4 release.

Versions since about 7.4 or so have been capable of producing this plan but
not for DISTINCT, only for the equivalent GROUP BY query:

postgres=# explain select x from i group by x limit 5;

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2008-12-04 14:20:17 Re: Optimizing DISTINCT with LIMIT
Previous Message Magnus Hagander 2008-12-04 14:08:13 Re: Re: pgsql: Properly unregister OpenSSL callbacks when libpq is done with