Re: Why Index is not used

From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: Chetan Suttraway <chetan(dot)suttraway(at)enterprisedb(dot)com>
Cc: Adarsh Sharma <adarsh(dot)sharma(at)orkash(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Why Index is not used
Date: 2011-03-25 13:24:55
Message-ID: 4D8C97A7.8030403@peak6.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 03/25/2011 04:07 AM, Chetan Suttraway wrote:

> The ideas is to have maximum filtering occuring on leading column of index.
> the first plan with only the predicates on clause_id is returning
> 379772050555842 rows whereas
> in the second plan with doc_id predicates is returning only 20954686217.
>
> So maybe you should consider re-ordering of the index on clause2.

That won't really help him. He's joining a 27M row table against a 31M
row table with basically no WHERE clause. We can see that because he's
getting 30M rows back in the EXPLAIN ANALYZE. At that point, it doesn't
really matter which table gets index scanned. This query will *always*
take several minutes to execute.

It would be completely different if he only wanted to get the results
for *one* source. Or *one* sentence. But getting all of them ever stored
will just take forever.

> I am sorry but I am not able to got your points completely.

He just means that indexes work better if they're placed in order of
selectivity. In your case, it seems sentence_id restricts the result set
better than clause_id. So Chetan suggested remaking your indexes to be
this instead:

CREATE INDEX idx_clause ON clause2
USING btree (sentence_id, clause_id, source_id);

CREATE INDEX idx_svo2 ON svo2
USING btree (sentence_id, clause_id, doc_id);

This *might* help. But your fundamental problem is that you're joining
two giant tables with no clause to limit the result set. If you were
only getting back 10,000 rows, or even a million rows, your query could
execute in a fraction of the time. But joining every row in both tables
and returning a 30-million row result set isn't going to be fun for
anyone. Are you actually processing all 30-million rows you get back?
Storing them somewhere?

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-03-25 14:12:10 Re: Shouldn't we have a way to avoid "risky" plans?
Previous Message Laszlo Nagy 2011-03-25 11:56:14 Re: Slow query on CLUTER -ed tables