Skip site navigation (1) Skip section navigation (2)

Re: Reason of Slowness of query

From: Adarsh Sharma <adarsh(dot)sharma(at)orkash(dot)com>
To: Chetan Suttraway <chetan(dot)suttraway(at)enterprisedb(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Reason of Slowness of query
Date: 2011-03-23 07:30:26
Message-ID: 4D89A192.7000407@orkash.com (view raw or flat)
Thread:
Lists: pgsql-performance
Thanks Chetan, here is the output of your updated query :


*explain  select distinct(p.crawled_page_id) from page_content p where 
NOT EXISTS (select 1 from  clause2 c where c.source_id = p.crawled_page_id);

*
                                      QUERY 
PLAN                                      
---------------------------------------------------------------------------------------
 HashAggregate  (cost=1516749.47..1520576.06 rows=382659 width=8)
   ->  Hash Anti Join  (cost=1294152.41..1515791.80 rows=383071 width=8)
         Hash Cond: (p.crawled_page_id = c.source_id)
         ->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8)
         ->  Hash  (cost=771182.96..771182.96 rows=31876196 width=4)
               ->  Seq Scan on clause2 c  (cost=0.00..771182.96 
rows=31876196 width=4)
(6 rows)

And my explain analyze output is :

                                                      QUERY 
PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1516749.47..1520576.06 rows=382659 width=8) 
(actual time=56666.181..56669.270 rows=72 loops=1)
   ->  Hash Anti Join  (cost=1294152.41..1515791.80 rows=383071 width=8) 
(actual time=45740.789..56665.816 rows=74 loops=1)
         Hash Cond: (p.crawled_page_id = c.source_id)
         ->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.012..715.915 rows=428467 loops=1)
         ->  Hash  (cost=771182.96..771182.96 rows=31876196 width=4) 
(actual time=45310.524..45310.524 rows=31853083 loops=1)
               ->  Seq Scan on clause2 c  (cost=0.00..771182.96 
rows=31876196 width=4) (actual time=0.055..23408.884 rows=31853083 loops=1)
 Total runtime: 56687.660 ms
(7 rows)

But Is there is any option to tune it further and one more thing output 
rows varies from 6 to 7.


Thanks & best Regards,
Adarsh Sharma






Chetan Suttraway wrote:
>
>
> On Wed, Mar 23, 2011 at 11:58 AM, Adarsh Sharma 
> <adarsh(dot)sharma(at)orkash(dot)com <mailto:adarsh(dot)sharma(at)orkash(dot)com>> wrote:
>
>     Dear all,
>
>     I have 2 tables in my database name clause2( 4900 MB) &
>     page_content(1582 MB).
>
>     My table definations are as :
>
>     *page_content :-
>
>     *CREATE TABLE page_content
>     (
>       content_id integer,
>       wkb_geometry geometry,
>       link_level integer,
>       isprocessable integer,
>       isvalid integer,
>       isanalyzed integer,
>       islocked integer,
>       content_language character(10),
>       url_id integer,
>       publishing_date character(40),
>       heading character(150),
>       category character(150),
>       crawled_page_url character(500),
>       keywords character(500),
>       dt_stamp timestamp with time zone,
>       "content" character varying,
>       crawled_page_id bigint,
>       id integer
>     )
>     WITH (
>       OIDS=FALSE
>     );
>
>     *Indexes on it :-*
>     CREATE INDEX idx_page_id  ON page_content  USING btree 
>     (crawled_page_id);
>     CREATE INDEX idx_page_id_content   ON page_content  USING btree 
>     (crawled_page_id, content_language, publishing_date, isprocessable);
>     CREATE INDEX pgweb_idx  ON page_content   USING gin  
>     (to_tsvector('english'::regconfig, content::text));
>
>     *clause 2:-
>     *CREATE TABLE clause2
>     (
>       id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
>       source_id integer,
>       sentence_id integer,
>       clause_id integer,
>       tense character varying(30),
>       clause text,
>       CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
>     )WITH ( OIDS=FALSE);
>
>     *Indexes on it :
>
>     *CREATE INDEX idx_clause2_march10
>       ON clause2
>       USING btree
>       (id, source_id);*
>
>     *I perform a join query on it as :
>
>     * explain analyze select distinct(p.crawled_page_id) from
>     page_content p , clause2  c where p.crawled_page_id != c.source_id ;
>
>     *What it takes more than 1 hour to complete. As I issue the
>     explain analyze command and cannot able to wait for output but I
>     send my explain output as :
>                                                  QUERY
>     PLAN                                              
>     --------------------------------------------------------------------------------------------------------
>      Unique  (cost=927576.16..395122387390.13 rows=382659 width=8)
>        ->  Nested Loop  (cost=927576.16..360949839832.15
>     rows=13669019023195 width=8)
>              Join Filter: (p.crawled_page_id <> c.source_id)
>              ->  Index Scan using idx_page_id on page_content p 
>     (cost=0.00..174214.02 rows=428817 width=8)
>              ->  Materialize  (cost=927576.16..1370855.12
>     rows=31876196 width=4)
>                    ->  Seq Scan on clause2 c  (cost=0.00..771182.96
>     rows=31876196 width=4)
>     (6 rows)
>
>
>     Please guide me how to make the above query run faster as I am not
>     able to do that.
>
>
>     Thanks, Adarsh
>
>     *
>
>     *
>
>
> Could you try just explaining the below query:
> explain  select distinct(p.crawled_page_id) from page_content p where 
> NOT EXISTS (select 1 from  clause2 c where c.source_id = 
> p.crawled_page_id);
>
> The idea here is to avoid directly using NOT operator.
>
>
>
> Regards,
> Chetan
>
> -- 
> Chetan Suttraway
> EnterpriseDB <http://www.enterprisedb.com/>, The Enterprise PostgreSQL 
> <http://www.enterprisedb.com/> company.
>
>
>

In response to

Responses

pgsql-performance by date

Next:From: Vitalii TymchyshynDate: 2011-03-23 08:51:25
Subject: Re: Reason of Slowness of query
Previous:From: Chetan SuttrawayDate: 2011-03-23 07:25:35
Subject: Re: Reason of Slowness of query

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group