Re: select distinct and index usage

From: "David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: "Alban Hertroys" <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>, pgsql-general(at)postgresql(dot)org
Subject: Re: select distinct and index usage
Date: 2008-04-08 00:54:38
Message-ID: e7f9235d0804071754q52ccbe07h1c18df336c933564@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Apr 7, 2008 at 7:57 PM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>
> You could try changing it to the equivalent GROUP BY query. The planner,
> unfortunately, doesn't know they're equivalent and has two separate sets of
> plans available. In this case where there are only 4,000 distinct values out
> of 75M original records you might find a HashAggregate plan, which the planner
> doesn't know can be used for DISTINCT, best. You might have to raise work_mem
> before the planner feels a hash will fit.
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Ask me about EnterpriseDB's RemoteDBA services!
>

Progress!

explain analyze select datestamp from vals group by datestamp;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1719740.40..1719783.03 rows=4263 width=4)
(actual time=120192.018..120193.930 rows=4252 loops=1)
-> Seq Scan on vals (cost=0.00..1531261.72 rows=75391472 width=4)
(actual time=17.441..66807.429 rows=75391476 loops=1)
Total runtime: 120195.144 ms

Compared with:

explain analyze select distinct datestamp from vals;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=15003047.47..15380004.83 rows=4263 width=4) (actual
time=649599.159..721671.595 rows=4252 loops=1)
-> Sort (cost=15003047.47..15191526.15 rows=75391472 width=4)
(actual time=649599.157..694392.602 rows=75391476 loops=1)
Sort Key: datestamp
Sort Method: external merge Disk: 1178592kB
-> Seq Scan on vals (cost=0.00..1531261.72 rows=75391472
width=4) (actual time=9.104..93130.468 rows=75391476 loops=1)
Total runtime: 722379.434 ms

Still doing the sequential scan on the table, but at least it's
avoiding the expensive disk merge sort. It still seems as if I ought
to be able to coax it into using an index for this type of query,
though- especially since it's using one on the other table. Is there
perhaps some way to reformulate the index in such a way as to make it
more useful to the planner?

--
- David T. Wilson
Princeton Satellite Systems
david(dot)t(dot)wilson(at)gmail(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2008-04-08 01:01:24 Re: select distinct and index usage
Previous Message Mike Ginsburg 2008-04-08 00:42:18 Most Occurring Value