Quick Links

Re: select distinct and index usage

From:	"David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com>
To:	"Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Alban Hertroys" <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>, pgsql-general(at)postgresql(dot)org
Subject:	Re: select distinct and index usage
Date:	2008-04-08 00:54:38
Message-ID:	e7f9235d0804071754q52ccbe07h1c18df336c933564@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mon, Apr 7, 2008 at 7:57 PM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>
> You could try changing it to the equivalent GROUP BY query. The planner,
> unfortunately, doesn't know they're equivalent and has two separate sets of
> plans available. In this case where there are only 4,000 distinct values out
> of 75M original records you might find a HashAggregate plan, which the planner
> doesn't know can be used for DISTINCT, best. You might have to raise work_mem
> before the planner feels a hash will fit.
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Ask me about EnterpriseDB's RemoteDBA services!
>

Progress!

explain analyze select datestamp from vals group by datestamp;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1719740.40..1719783.03 rows=4263 width=4)
(actual time=120192.018..120193.930 rows=4252 loops=1)
-> Seq Scan on vals (cost=0.00..1531261.72 rows=75391472 width=4)
(actual time=17.441..66807.429 rows=75391476 loops=1)
Total runtime: 120195.144 ms

Compared with:

explain analyze select distinct datestamp from vals;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=15003047.47..15380004.83 rows=4263 width=4) (actual
time=649599.159..721671.595 rows=4252 loops=1)
-> Sort (cost=15003047.47..15191526.15 rows=75391472 width=4)
(actual time=649599.157..694392.602 rows=75391476 loops=1)
Sort Key: datestamp
Sort Method: external merge Disk: 1178592kB
-> Seq Scan on vals (cost=0.00..1531261.72 rows=75391472
width=4) (actual time=9.104..93130.468 rows=75391476 loops=1)
Total runtime: 722379.434 ms

Still doing the sequential scan on the table, but at least it's
avoiding the expensive disk merge sort. It still seems as if I ought
to be able to coax it into using an index for this type of query,
though- especially since it's using one on the other table. Is there
perhaps some way to reformulate the index in such a way as to make it
more useful to the planner?

--
- David T. Wilson
Princeton Satellite Systems
david(dot)t(dot)wilson(at)gmail(dot)com

In response to

Re: select distinct and index usage at 2008-04-07 23:57:21 from Gregory Stark

Responses

Re: select distinct and index usage at 2008-04-08 01:01:24 from Alvaro Herrera
Re: select distinct and index usage at 2008-04-08 01:11:50 from Stephen Denne

Browse pgsql-general by date

	From	Date	Subject
Next Message	Alvaro Herrera	2008-04-08 01:01:24	Re: select distinct and index usage
Previous Message	Mike Ginsburg	2008-04-08 00:42:18	Most Occurring Value