Skip site navigation (1) Skip section navigation (2)

Re: Why Index is not used

From: Adarsh Sharma <adarsh(dot)sharma(at)orkash(dot)com>
To: Chetan Suttraway <chetan(dot)suttraway(at)enterprisedb(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, tv(at)fuzzy(dot)cz
Subject: Re: Why Index is not used
Date: 2011-03-25 09:53:24
Message-ID: 4D8C6614.1050007@orkash.com (view raw or flat)
Thread:
Lists: pgsql-performance
Chetan Suttraway wrote:
>
>
> On Fri, Mar 25, 2011 at 2:25 PM, Adarsh Sharma 
> <adarsh(dot)sharma(at)orkash(dot)com <mailto:adarsh(dot)sharma(at)orkash(dot)com>> wrote:
>
>>
>>     Could you please post output of below queries:
>>     explain select c.clause, s.* from clause2 c, svo2 s where
>>     c.clause_id=s.clause_id;
>>     explain select c.clause, s.* from clause2 c, svo2 s where
>>     s.doc_id=c.source_id;
>>     explain select c.clause, s.* from clause2 c, svo2 s where
>>     c.sentence_id=s.sentence_id ;
>
>
>     As per your instructions, Please  check the below output :-
>
>     pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
>     where c.clause_id=s.clause_id;
>                                        QUERY
>     PLAN                                   
>     ---------------------------------------------------------------------------------
>      Hash Join  (cost=7828339.10..4349603998133.96
>     rows=379772050555842 width=2053)
>        Hash Cond: (c.clause_id = s.clause_id)
>        ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084
>     width=64)
>        ->  Hash  (cost=697537.60..697537.60 rows=27471560 width=1993)
>              ->  Seq Scan on svo2 s  (cost=0.00..697537.60
>     rows=27471560 width=1993)
>     (5 rows)
>
>     pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
>     where s.doc_id=c.source_id;
>                                           QUERY
>     PLAN                                      
>     ---------------------------------------------------------------------------------------
>      Merge Join  (cost=43635232.12..358368926.66 rows=20954686217
>     width=2053)
>        Merge Cond: (c.source_id = s.doc_id)
>        ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
>              Sort Key: c.source_id
>              ->  Seq Scan on clause2 c  (cost=0.00..770951.84
>     rows=31853084 width=64)
>        ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560
>     width=1993)
>              ->  Sort  (cost=38028881.02..38097559.92 rows=27471560
>     width=1993)
>                    Sort Key: s.doc_id
>                    ->  Seq Scan on svo2 s  (cost=0.00..697537.60
>     rows=27471560 width=1993)
>     (9 rows)
>
>     pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
>     where c.sentence_id=s.sentence_id ;
>                                           QUERY
>     PLAN                                      
>     ---------------------------------------------------------------------------------------
>      Merge Join  (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
>        Merge Cond: (c.sentence_id = s.sentence_id)
>        ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
>              Sort Key: c.sentence_id
>              ->  Seq Scan on clause2 c  (cost=0.00..770951.84
>     rows=31853084 width=64)
>        ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560
>     width=1993)
>              ->  Sort  (cost=38028881.02..38097559.92 rows=27471560
>     width=1993)
>                    Sort Key: s.sentence_id
>                    ->  Seq Scan on svo2 s  (cost=0.00..697537.60
>     rows=27471560 width=1993)
>     (9 rows)
>
>     Please  let me know if any other information is required.
>
>
>
>
>
>>
>>     -- 
>>     Best Regards,
>>     Adarsh Sharma
>>
>>
>
> The ideas is to have maximum filtering occuring on leading column of 
> index.
> the first plan with only the predicates on clause_id is returning 
> 379772050555842 rows whereas
> in the second plan with doc_id predicates is returning only 20954686217.
>
> So maybe you should consider re-ordering of the index on clause2.
>
> I am thinking that you created the indexes by looking at the columns 
> used in the where clause.
> But its not always helpful to create  indexes based on exact order of 
> predicates specified in query.
> Instead the idea should be consider the predicate which is going to do 
> filter out the results.
> Likewise we should consider all possible uses of index columns across 
> all queries and then decide on the
> order of columns for the composite index to be created.
>
> Whats your take on this?

I am sorry but I am not able to got your points completely.

My table definitions are as :

*Clause2 Table :

*CREATE TABLE clause2
(
  id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
  source_id integer,
  sentence_id integer,
  clause_id integer,
  tense character varying(30),
  clause text,
  CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id, 
sentence_id);

*svo2 table :*--

CREATE TABLE svo2
(
  svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
  doc_id integer,
  sentence_id integer,
  clause_id integer,
  negation integer,
  subject character varying(3000),
  verb character varying(3000),
  "object" character varying(3000),
  preposition character varying(3000),
  subject_type character varying(3000),
  object_type character varying(3000),
  subject_attribute character varying(3000),
  object_attribute character varying(3000),
  verb_attribute character varying(3000),
  subject_concept character varying(100),
  object_concept character varying(100),
  subject_sense character varying(100),
  object_sense character varying(100),
  subject_chain character varying(5000),
  object_chain character varying(5000),
  sub_type_id integer,
  obj_type_id integer,
  CONSTRAINT pk_svo_demo_id PRIMARY KEY (svo_id)
)
WITH (
  OIDS=FALSE
);
CREATE INDEX idx_svo2  ON svo2  USING btree  (clause_id, doc_id, 
sentence_id);

Please correct me if I m wrong.

I need to change the order of columns in indexes according to the filter 
conditions but in this query .

After making

set enable_mergejoin = false
and random_page_cost =2.0

The problem remains the same.






What is your recommendations for the new index so that the query runs 
even faster.


I can change my original query to :

explain analyze select 
c.clause,s.doc_id,s.subject,s.verb,s.object,s.subject_type,s.object_type 
from clause2 c, svo2 s where c.clause_id=s.clause_id and 
s.doc_id=c.source_id and c.sentence_id=s.sentence_id ;

And the output is :

                                                             QUERY 
PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..128419720.68 rows=167324179 width=105) (actual 
time=11.179..285708.966 rows=30473117 loops=1)
   ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=53) 
(actual time=0.013..19554.222 rows=27471560 loops=1)
   ->  Index Scan using idx_clause on clause2 c  (cost=0.00..4.63 rows=1 
width=72) (actual time=0.006..0.007 rows=1 loops=27471560)
         Index Cond: ((c.clause_id = s.clause_id) AND (c.source_id = 
s.doc_id) AND (c.sentence_id = s.sentence_id))
 Total runtime: 301599.274 ms


Thanks & best Regards,
Adarsh Sharma


> Regards,
> Chetan Suttraway
> EnterpriseDB <http://www.enterprisedb.com/>, The Enterprise PostgreSQL 
> <http://www.enterprisedb.com/> company.
>
>
>

In response to

pgsql-performance by date

Next:From: Laszlo NagyDate: 2011-03-25 11:56:14
Subject: Re: Slow query on CLUTER -ed tables
Previous:From: Vitalii TymchyshynDate: 2011-03-25 09:43:14
Subject: Re: Shouldn't we have a way to avoid "risky" plans?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group