Re: Why Index is not used

From: Adarsh Sharma <adarsh(dot)sharma(at)orkash(dot)com>
To: Chetan Suttraway <chetan(dot)suttraway(at)enterprisedb(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, tv(at)fuzzy(dot)cz
Subject: Re: Why Index is not used
Date: 2011-03-25 09:53:24
Message-ID: 4D8C6614.1050007@orkash.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Chetan Suttraway wrote:
>
>
> On Fri, Mar 25, 2011 at 2:25 PM, Adarsh Sharma
> <adarsh(dot)sharma(at)orkash(dot)com <mailto:adarsh(dot)sharma(at)orkash(dot)com>> wrote:
>
>>
>> Could you please post output of below queries:
>> explain select c.clause, s.* from clause2 c, svo2 s where
>> c.clause_id=s.clause_id;
>> explain select c.clause, s.* from clause2 c, svo2 s where
>> s.doc_id=c.source_id;
>> explain select c.clause, s.* from clause2 c, svo2 s where
>> c.sentence_id=s.sentence_id ;
>
>
> As per your instructions, Please check the below output :-
>
> pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
> where c.clause_id=s.clause_id;
> QUERY
> PLAN
> ---------------------------------------------------------------------------------
> Hash Join (cost=7828339.10..4349603998133.96
> rows=379772050555842 width=2053)
> Hash Cond: (c.clause_id = s.clause_id)
> -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084
> width=64)
> -> Hash (cost=697537.60..697537.60 rows=27471560 width=1993)
> -> Seq Scan on svo2 s (cost=0.00..697537.60
> rows=27471560 width=1993)
> (5 rows)
>
> pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
> where s.doc_id=c.source_id;
> QUERY
> PLAN
> ---------------------------------------------------------------------------------------
> Merge Join (cost=43635232.12..358368926.66 rows=20954686217
> width=2053)
> Merge Cond: (c.source_id = s.doc_id)
> -> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64)
> Sort Key: c.source_id
> -> Seq Scan on clause2 c (cost=0.00..770951.84
> rows=31853084 width=64)
> -> Materialize (cost=38028881.02..38372275.52 rows=27471560
> width=1993)
> -> Sort (cost=38028881.02..38097559.92 rows=27471560
> width=1993)
> Sort Key: s.doc_id
> -> Seq Scan on svo2 s (cost=0.00..697537.60
> rows=27471560 width=1993)
> (9 rows)
>
> pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
> where c.sentence_id=s.sentence_id ;
> QUERY
> PLAN
> ---------------------------------------------------------------------------------------
> Merge Join (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
> Merge Cond: (c.sentence_id = s.sentence_id)
> -> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64)
> Sort Key: c.sentence_id
> -> Seq Scan on clause2 c (cost=0.00..770951.84
> rows=31853084 width=64)
> -> Materialize (cost=38028881.02..38372275.52 rows=27471560
> width=1993)
> -> Sort (cost=38028881.02..38097559.92 rows=27471560
> width=1993)
> Sort Key: s.sentence_id
> -> Seq Scan on svo2 s (cost=0.00..697537.60
> rows=27471560 width=1993)
> (9 rows)
>
> Please let me know if any other information is required.
>
>
>
>
>
>>
>> --
>> Best Regards,
>> Adarsh Sharma
>>
>>
>
> The ideas is to have maximum filtering occuring on leading column of
> index.
> the first plan with only the predicates on clause_id is returning
> 379772050555842 rows whereas
> in the second plan with doc_id predicates is returning only 20954686217.
>
> So maybe you should consider re-ordering of the index on clause2.
>
> I am thinking that you created the indexes by looking at the columns
> used in the where clause.
> But its not always helpful to create indexes based on exact order of
> predicates specified in query.
> Instead the idea should be consider the predicate which is going to do
> filter out the results.
> Likewise we should consider all possible uses of index columns across
> all queries and then decide on the
> order of columns for the composite index to be created.
>
> Whats your take on this?

I am sorry but I am not able to got your points completely.

My table definitions are as :

*Clause2 Table :

*CREATE TABLE clause2
(
id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
source_id integer,
sentence_id integer,
clause_id integer,
tense character varying(30),
clause text,
CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id,
sentence_id);

*svo2 table :*--

CREATE TABLE svo2
(
svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
doc_id integer,
sentence_id integer,
clause_id integer,
negation integer,
subject character varying(3000),
verb character varying(3000),
"object" character varying(3000),
preposition character varying(3000),
subject_type character varying(3000),
object_type character varying(3000),
subject_attribute character varying(3000),
object_attribute character varying(3000),
verb_attribute character varying(3000),
subject_concept character varying(100),
object_concept character varying(100),
subject_sense character varying(100),
object_sense character varying(100),
subject_chain character varying(5000),
object_chain character varying(5000),
sub_type_id integer,
obj_type_id integer,
CONSTRAINT pk_svo_demo_id PRIMARY KEY (svo_id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id,
sentence_id);

Please correct me if I m wrong.

I need to change the order of columns in indexes according to the filter
conditions but in this query .

After making

set enable_mergejoin = false
and random_page_cost =2.0

The problem remains the same.

What is your recommendations for the new index so that the query runs
even faster.

I can change my original query to :

explain analyze select
c.clause,s.doc_id,s.subject,s.verb,s.object,s.subject_type,s.object_type
from clause2 c, svo2 s where c.clause_id=s.clause_id and
s.doc_id=c.source_id and c.sentence_id=s.sentence_id ;

And the output is :

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..128419720.68 rows=167324179 width=105) (actual
time=11.179..285708.966 rows=30473117 loops=1)
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=53)
(actual time=0.013..19554.222 rows=27471560 loops=1)
-> Index Scan using idx_clause on clause2 c (cost=0.00..4.63 rows=1
width=72) (actual time=0.006..0.007 rows=1 loops=27471560)
Index Cond: ((c.clause_id = s.clause_id) AND (c.source_id =
s.doc_id) AND (c.sentence_id = s.sentence_id))
Total runtime: 301599.274 ms

Thanks & best Regards,
Adarsh Sharma

> Regards,
> Chetan Suttraway
> EnterpriseDB <http://www.enterprisedb.com/>, The Enterprise PostgreSQL
> <http://www.enterprisedb.com/> company.
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Laszlo Nagy 2011-03-25 11:56:14 Re: Slow query on CLUTER -ed tables
Previous Message Vitalii Tymchyshyn 2011-03-25 09:43:14 Re: Shouldn't we have a way to avoid "risky" plans?