sequential scan on select distinct

From: Ole Langbehn <ole(at)freiheit(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: sequential scan on select distinct
Date: 2004-10-06 09:30:58
Message-ID: 200410061130.58625.ole@freiheit.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I'm using Postgres 7.4.5. Tables are analyzed & vacuumed.

I am wondering why postgresql never uses an index on queries of the type
'select distinct ...' while e.g. mysql uses the index on the same query.
See the following explains:

postgresql:

explain analyze select distinct "land" from "customer_dim";
---------------------------------------------------------------------------------------------------------------------------------------+
QUERY PLAN |
---------------------------------------------------------------------------------------------------------------------------------------+
Unique (cost=417261.85..430263.66 rows=18 width=15) (actual time=45875.235..67204.694 rows=103 loops=1) |
-> Sort (cost=417261.85..423762.75 rows=2600362 width=15) (actual time=45875.226..54114.473 rows=2600362 loops=1) |
Sort Key: land |
-> Seq Scan on customer_dim (cost=0.00..84699.62 rows=2600362 width=15) (actual time=0.048..10733.227 rows=2600362 loops=1) |
Total runtime: 67246.465 ms |
---------------------------------------------------------------------------------------------------------------------------------------+

mysql:

explain select DISTINCT `customer_dim`.`land` from `customer_dim`;
--------------+-------+---------------+---------------+---------+--------+---------+-------------+
table | type | possible_keys | key | key_len | ref | rows | Extra |
--------------+-------+---------------+---------------+---------+--------+---------+-------------+
customer_dim | index | [NULL] | IDX_cstd_land | 81 | [NULL] | 2600362 | Using index |
--------------+-------+---------------+---------------+---------+--------+---------+-------------+
1 row in result (first row: 8 msec; total: 9 msec)

The result set contains 103 rows (but i get this behavior with every query of
this kind). My tables consist of at least a million rows.

The indexes on the column 'land' are standard indexes, so in case of
postgresql, it's a btree-index. I've tried to change the index type, but to no
avail.

So, why doesn't postgresql use the index, and (how) could i persuade postgresql
to use an index for this type of query?

TiA

--
Ole Langbehn

freiheit.com technologies gmbh
Theodorstr. 42-90 / 22761 Hamburg, Germany
fon       +49 (0)40 / 890584-0
fax       +49 (0)40 / 890584-20

Freie Software durch Bücherkauf fördern | http://bookzilla.de/

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alban Médici (NetCentrex) 2004-10-06 10:15:31 stats on cursor and query execution troubleshooting
Previous Message Greg Stark 2004-10-06 07:44:02 Re: Excessive context switching on SMP Xeons