Re: Why query takes soo much time

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Why query takes soo much time
Date: 2011-05-16 10:00:03
Message-ID: 4DD0F5A3.1060007@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 05/16/2011 01:39 PM, Adarsh Sharma wrote:
> Dear all,
> I have a query on 3 tables in a database as :-
>
> _*Explain Analyze Output :-*_
>
> explain anayze select c.clause, s.subject ,s.object , s.verb,
> s.subject_type , s.object_type ,s.doc_id ,s.svo_id from clause2 c,
> svo2 s ,page_content p where c.clause_id=s.clause_id and
> s.doc_id=c.source_id and c.sentence_id=s.sentence_id and
> s.doc_id=p.crawled_page_id order by s.svo_id limit 1000 offset 17929000
>

Using limit and offset can be horrifyingly slow for non-trivial queries.
Are you trying to paginate results? If not, what are you trying to achieve?

In most (all?) cases, Pg will have to execute the query up to the point
where it's found limit+offset rows, producing and discarding offset rows
as it goes. Needless to say, that's horrifyingly inefficient.

Reformatting your query for readability (to me) as:

EXPLAIN ANALYZE
SELECT c.clause, s.subject ,s.object , s.verb, s.subject_type,
s.object_type ,s.doc_id ,s.svo_id
FROM clause2 c INNER JOIN svo2 s ON (c.clause_id=s.clause_id AND
c.source_id=s.doc_id AND c.sentence_id=s.sentence_id)
INNER JOIN page_content p ON (s.doc_id=p.crawled_page_id)
ORDER BY s.svo_id limit 1000 offset 17929000

... I can see that you're joining on
(c.clause_id,c.source_id,c.sentence_id)=(s.clause_id,s.doc_id,s.sentence_id).
You have matching indexes idx_clause2_id and idx_svo2_id_dummy with
matching column ordering. Pg is using idx_clause2_id in the join of svo2
and clause2, but instead of doing a bitmap index scan using it and
idx_svo2_id_dummy it's doing a nested loop using idx_clause2_id and
pk_svo_id.

First: make sure your stats are up to date by ANALYZE-ing your tables
and probably increasing the stats collected on the join columns and/or
increasing default_statistics_target. If that doesn't help, personally
I'd play with the random_page_cost and seq_page_cost to see if they
reflect your machine's actual performance, and to see if you get a more
favourable plan. If I were experimenting with this I'd also see if
giving the query lots of work_mem allowed it to try a different approach
to the join.

> "Limit (cost=21685592.91..21686802.44 rows=1000 width=2624) (actual
> time=414601.802..414622.920 rows=1000 loops=1)"
> " -> Nested Loop (cost=59.77..320659013645.28 rows=265112018116
> width=2624) (actual time=0.422..404902.314 rows=17930000 loops=1)"
> " -> Nested Loop (cost=0.00..313889654.42 rows=109882338
> width=2628) (actual time=0.242..174223.789 rows=17736897 loops=1)"
> " -> Index Scan using pk_svo_id on svo2 s
> (cost=0.00..33914955.13 rows=26840752 width=2600) (actual
> time=0.157..14691.039 rows=14238271 loops=1)"
> " -> Index Scan using idx_clause2_id on clause2 c
> (cost=0.00..10.36 rows=4 width=44) (actual time=0.007..0.008 rows=1
> loops=14238271)"
> " Index Cond: ((c.source_id = s.doc_id) AND
> (c.clause_id = s.clause_id) AND (c.sentence_id = s.sentence_id))"
> " -> Bitmap Heap Scan on page_content p (cost=59.77..2885.18
> rows=2413 width=8) (actual time=0.007..0.008 rows=1 loops=17736897)"
> " Recheck Cond: (p.crawled_page_id = s.doc_id)"
> " -> Bitmap Index Scan on idx_crawled_id
> (cost=0.00..59.17 rows=2413 width=0) (actual time=0.005..0.005 rows=1
> loops=17736897)"
> " Index Cond: (p.crawled_page_id = s.doc_id)"
> "Total runtime: 414623.634 ms"
>
> _*My Table & index definitions are as under :-
>
> *_Estimated rows in 3 tables are :-
>
> clause2 10341700
> svo2 26008000
> page_content 479785
>
> CREATE TABLE clause2
> (
> id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
> source_id integer,
> sentence_id integer,
> clause_id integer,
> tense character varying(30),
> clause text,
> CONSTRAINT pk_clause_id PRIMARY KEY (id)
> )WITH ( OIDS=FALSE);
> CREATE INDEX idx_clause2_id ON clause2 USING btree (source_id,
> clause_id, sentence_id);
>
> CREATE TABLE svo2
> (
> svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
> doc_id integer,
> sentence_id integer,
> clause_id integer,
> negation integer,
> subject character varying(3000),
> verb character varying(3000),
> "object" character varying(3000),
> preposition character varying(3000),
> subject_type character varying(3000),
> object_type character varying(3000),
> subject_attribute character varying(3000),
> object_attribute character varying(3000),
> verb_attribute character varying(3000),
> subject_concept character varying(100),
> object_concept character varying(100),
> subject_sense character varying(100),
> object_sense character varying(100),
> subject_chain character varying(5000),
> object_chain character varying(5000),
> sub_type_id integer,
> obj_type_id integer,
> CONSTRAINT pk_svo_id PRIMARY KEY (svo_id)
> )WITH ( OIDS=FALSE);
> CREATE INDEX idx_svo2_id_dummy ON svo2 USING btree (doc_id,
> clause_id, sentence_id);
>
> CREATE TABLE page_content
> (
> content_id integer NOT NULL DEFAULT
> nextval('page_content_ogc_fid_seq'::regclass),
> wkb_geometry geometry,
> link_level integer,
> isprocessable integer,
> isvalid integer,
> isanalyzed integer,
> islocked integer,
> content_language character(10),
> url_id integer,
> publishing_date character(40),
> heading character(150),
> category character(150),
> crawled_page_url character(500),
> keywords character(500),
> dt_stamp timestamp with time zone,
> "content" character varying,
> crawled_page_id bigint,
> CONSTRAINT page_content_pk PRIMARY KEY (content_id),
> CONSTRAINT enforce_dims_wkb_geometry CHECK (st_ndims(wkb_geometry) = 2),
> CONSTRAINT enforce_srid_wkb_geometry CHECK (st_srid(wkb_geometry) =
> (-1))
> )WITH ( OIDS=FALSE);
> CREATE INDEX idx_crawled_id ON page_content USING btree
> (crawled_page_id);
> CREATE INDEX pgweb_idx ON page_content USING gin
> (to_tsvector('english'::regconfig, content::text));
>
> If possible, Please let me know if I am something wrong or any
> alternate query to run it faster.
>
>
> Thanks

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Denis de Bernardy 2011-05-16 11:16:44 Re: Why query takes soo much time
Previous Message Robert Klemme 2011-05-16 08:19:47 Re: [PERFORMANCE] expanding to SAN: which portion best to move