Quick Links

Optimizing DISTINCT with LIMIT

From:	tmp <skrald(at)amossen(dot)dk>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Optimizing DISTINCT with LIMIT
Date:	2008-12-04 13:32:51
Message-ID:	gh8m5v$7oj$1@news.hub.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

As far as I have understood the following query
SELECT DISTINCT foo
FROM bar
LIMIT baz
is done by first sorting the input and then traversing the sorted data,
ensuring uniqueness of output and stopping when the LIMIT threshold is
reached. Furthermore, a part of the sort procedure is to traverse input
at least one time.

Now, if the input is large but the LIMIT threshold is small, this
sorting step may increase the query time unnecessarily so here is a
suggestion for optimization:
If the input is "sufficiently" large and the LIMIT threshold
"sufficiently" small, maintain the DISTINCT output by hashning while
traversing the input and stop when the LIMIT threshold is reached. No
sorting required and *at* *most* one read of input.

Use case: Websites that needs to present small samples of huge queries fast.

Responses

Re: Optimizing DISTINCT with LIMIT at 2008-12-04 14:09:34 from Gregory Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Gregory Stark	2008-12-04 13:48:05	Re: In-place upgrade: catalog side
Previous Message	Gregory Stark	2008-12-04 13:21:37	Assertion failure in new outer/semi/anti join code