Re: Reason of Slowness of query

From: Adarsh Sharma <adarsh(dot)sharma(at)orkash(dot)com>
To: Chetan Suttraway <chetan(dot)suttraway(at)enterprisedb(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Reason of Slowness of query
Date: 2011-03-23 07:30:26
Message-ID: 4D89A192.7000407@orkash.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Chetan, here is the output of your updated query :

*explain select distinct(p.crawled_page_id) from page_content p where
NOT EXISTS (select 1 from clause2 c where c.source_id = p.crawled_page_id);

*
QUERY
PLAN
---------------------------------------------------------------------------------------
HashAggregate (cost=1516749.47..1520576.06 rows=382659 width=8)
-> Hash Anti Join (cost=1294152.41..1515791.80 rows=383071 width=8)
Hash Cond: (p.crawled_page_id = c.source_id)
-> Seq Scan on page_content p (cost=0.00..87132.17
rows=428817 width=8)
-> Hash (cost=771182.96..771182.96 rows=31876196 width=4)
-> Seq Scan on clause2 c (cost=0.00..771182.96
rows=31876196 width=4)
(6 rows)

And my explain analyze output is :

QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1516749.47..1520576.06 rows=382659 width=8)
(actual time=56666.181..56669.270 rows=72 loops=1)
-> Hash Anti Join (cost=1294152.41..1515791.80 rows=383071 width=8)
(actual time=45740.789..56665.816 rows=74 loops=1)
Hash Cond: (p.crawled_page_id = c.source_id)
-> Seq Scan on page_content p (cost=0.00..87132.17
rows=428817 width=8) (actual time=0.012..715.915 rows=428467 loops=1)
-> Hash (cost=771182.96..771182.96 rows=31876196 width=4)
(actual time=45310.524..45310.524 rows=31853083 loops=1)
-> Seq Scan on clause2 c (cost=0.00..771182.96
rows=31876196 width=4) (actual time=0.055..23408.884 rows=31853083 loops=1)
Total runtime: 56687.660 ms
(7 rows)

But Is there is any option to tune it further and one more thing output
rows varies from 6 to 7.

Thanks & best Regards,
Adarsh Sharma

Chetan Suttraway wrote:
>
>
> On Wed, Mar 23, 2011 at 11:58 AM, Adarsh Sharma
> <adarsh(dot)sharma(at)orkash(dot)com <mailto:adarsh(dot)sharma(at)orkash(dot)com>> wrote:
>
> Dear all,
>
> I have 2 tables in my database name clause2( 4900 MB) &
> page_content(1582 MB).
>
> My table definations are as :
>
> *page_content :-
>
> *CREATE TABLE page_content
> (
> content_id integer,
> wkb_geometry geometry,
> link_level integer,
> isprocessable integer,
> isvalid integer,
> isanalyzed integer,
> islocked integer,
> content_language character(10),
> url_id integer,
> publishing_date character(40),
> heading character(150),
> category character(150),
> crawled_page_url character(500),
> keywords character(500),
> dt_stamp timestamp with time zone,
> "content" character varying,
> crawled_page_id bigint,
> id integer
> )
> WITH (
> OIDS=FALSE
> );
>
> *Indexes on it :-*
> CREATE INDEX idx_page_id ON page_content USING btree
> (crawled_page_id);
> CREATE INDEX idx_page_id_content ON page_content USING btree
> (crawled_page_id, content_language, publishing_date, isprocessable);
> CREATE INDEX pgweb_idx ON page_content USING gin
> (to_tsvector('english'::regconfig, content::text));
>
> *clause 2:-
> *CREATE TABLE clause2
> (
> id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
> source_id integer,
> sentence_id integer,
> clause_id integer,
> tense character varying(30),
> clause text,
> CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
> )WITH ( OIDS=FALSE);
>
> *Indexes on it :
>
> *CREATE INDEX idx_clause2_march10
> ON clause2
> USING btree
> (id, source_id);*
>
> *I perform a join query on it as :
>
> * explain analyze select distinct(p.crawled_page_id) from
> page_content p , clause2 c where p.crawled_page_id != c.source_id ;
>
> *What it takes more than 1 hour to complete. As I issue the
> explain analyze command and cannot able to wait for output but I
> send my explain output as :
> QUERY
> PLAN
> --------------------------------------------------------------------------------------------------------
> Unique (cost=927576.16..395122387390.13 rows=382659 width=8)
> -> Nested Loop (cost=927576.16..360949839832.15
> rows=13669019023195 width=8)
> Join Filter: (p.crawled_page_id <> c.source_id)
> -> Index Scan using idx_page_id on page_content p
> (cost=0.00..174214.02 rows=428817 width=8)
> -> Materialize (cost=927576.16..1370855.12
> rows=31876196 width=4)
> -> Seq Scan on clause2 c (cost=0.00..771182.96
> rows=31876196 width=4)
> (6 rows)
>
>
> Please guide me how to make the above query run faster as I am not
> able to do that.
>
>
> Thanks, Adarsh
>
> *
>
> *
>
>
> Could you try just explaining the below query:
> explain select distinct(p.crawled_page_id) from page_content p where
> NOT EXISTS (select 1 from clause2 c where c.source_id =
> p.crawled_page_id);
>
> The idea here is to avoid directly using NOT operator.
>
>
>
> Regards,
> Chetan
>
> --
> Chetan Suttraway
> EnterpriseDB <http://www.enterprisedb.com/>, The Enterprise PostgreSQL
> <http://www.enterprisedb.com/> company.
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2011-03-23 08:51:25 Re: Reason of Slowness of query
Previous Message Chetan Suttraway 2011-03-23 07:25:35 Re: Reason of Slowness of query