Skip site navigation (1) Skip section navigation (2)

Re: Why Index is not used

From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: Chetan Suttraway <chetan(dot)suttraway(at)enterprisedb(dot)com>
Cc: Adarsh Sharma <adarsh(dot)sharma(at)orkash(dot)com>,"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Why Index is not used
Date: 2011-03-25 13:24:55
Message-ID: 4D8C97A7.8030403@peak6.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 03/25/2011 04:07 AM, Chetan Suttraway wrote:

> The ideas is to have maximum filtering occuring on leading column of index.
> the first plan with only the predicates on clause_id is returning
> 379772050555842 rows whereas
> in the second plan with doc_id predicates is returning only 20954686217.
>
> So maybe you should consider re-ordering of the index on clause2.

That won't really help him. He's joining a 27M row table against a 31M 
row table with basically no WHERE clause. We can see that because he's 
getting 30M rows back in the EXPLAIN ANALYZE. At that point, it doesn't 
really matter which table gets index scanned. This query will *always* 
take several minutes to execute.

It would be completely different if he only wanted to get the results 
for *one* source. Or *one* sentence. But getting all of them ever stored 
will just take forever.

> I am sorry but I am not able to got your points completely.

He just means that indexes work better if they're placed in order of 
selectivity. In your case, it seems sentence_id restricts the result set 
better than clause_id. So Chetan suggested remaking your indexes to be 
this instead:

CREATE INDEX idx_clause ON clause2
  USING btree (sentence_id, clause_id, source_id);

CREATE INDEX idx_svo2 ON svo2
  USING btree (sentence_id, clause_id, doc_id);

This *might* help. But your fundamental problem is that you're joining 
two giant tables with no clause to limit the result set. If you were 
only getting back 10,000 rows, or even a million rows, your query could 
execute in a fraction of the time. But joining every row in both tables 
and returning a 30-million row result set isn't going to be fun for 
anyone. Are you actually processing all 30-million rows you get back? 
Storing them somewhere?

-- 
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2011-03-25 14:12:10
Subject: Re: Shouldn't we have a way to avoid "risky" plans?
Previous:From: Laszlo NagyDate: 2011-03-25 11:56:14
Subject: Re: Slow query on CLUTER -ed tables

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group