Skip site navigation (1) Skip section navigation (2)

Re: Index not used on group by

From: Richard Huxton <dev(at)archonet(dot)com>
To: Андрей Репко <repko(at)sart(dot)must-ipra(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Index not used on group by
Date: 2005-09-27 14:34:02
Message-ID: 4339585A.9040507@archonet.com (view raw or flat)
Thread:
Lists: pgsql-performance
Андрей Репко wrote:
> Здравствуйте Richard,
> 
> Tuesday, September 27, 2005, 2:08:31 PM, Вы писали:
> 
> 
> 
>>>sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id;
>>>                                                       QUERY PLAN
>>>-------------------------------------------------------------------------------------------------------------------------
>>> HashAggregate  (cost=38565.30..38565.62 rows=32 width=4)
>>>(actual time=15990.863..15990.933 rows=32 loops=1)
>>>   ->  Seq Scan on ma_data  (cost=0.00..37811.24 rows=301624
>>>width=4) (actual time=3.446..14572.141 rows=301624 loops=1)
>>> Total runtime: 15991.244 ms
> 
> 
> RH> OK - the planner thinks it's doing the right thing, your cost estimates
> RH> are way off. If you look back at where you got an index-scan, it's cost
> RH> was 1.1 million.
> RH>    Index Scan using reference_9_fk on ma_data (cost=0.00..1139526.57
> But why PG scan _all_ the records in the table? As I understand we can
> "just" select information from index, not scaning all the table? Of
> course if we select ALL records from table index can't help us.

Actually, if you select more than 5-10% of the rows (in general) you are 
better off using a seq-scan.

PostgreSQL estimates the total cost of possible query plans and picks 
the cheapest. In your case your configuration settings seem to be 
pushing the cost of an index scan much higher than it is. So, it picks 
the sequential-scan.

> If I write something like:
> SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =1 LIMIT 1)
> UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =2 LIMIT 1)
> UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =3 LIMIT 1)
> UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =4 LIMIT 1)
> UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =5 LIMIT 1)
> UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =6 LIMIT 1)
> UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =7 LIMIT 1)
> UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =8 LIMIT 1)
> UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =9 LIMIT 1)
> ...
> UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id=max_alias_id LIMIT 1)
> It works better, much better.

Of course - it will always choose index queries here - it can see you 
are only fetching one row in each subquery.

Correct your configuration settings so PG estimates the cost of an index 
  query correctly and all should be well.
--
   Richard Huxton
   Archonet Ltd


In response to

pgsql-performance by date

Next:From: DarioDate: 2005-09-27 15:12:53
Subject: Re: slow database, queries accumulating
Previous:From: Gnanavel SDate: 2005-09-27 14:20:35
Subject: Re: [PERFORM] PostgreSQL overall design

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group