Skip site navigation (1) Skip section navigation (2)

Nested Loop trouble : Execution time increases more 1000 time (long)

From: Antoine Bajolet <antoine(dot)bajolet(at)free(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Nested Loop trouble : Execution time increases more 1000 time (long)
Date: 2005-09-17 15:47:18
Message-ID: 432C3A86.5010000@free.fr (view raw or flat)
Thread:
Lists: pgsql-performance
Hello,

We are using postgresql in a search engine on an intranet handling 
throusand of documents.
But we ave a big problem when users use more than two search key.

There are more tables around, but the heart of  the search engine is 
made of three tables :

fiches (f_id int4, f_title varchar)          52445 rows
engine (f_id int4, k_id int4, weight )    11761700 rows
keywords(k_id, keyword)                    1072600 rows

A "fiche" is linked to any kind of document.
The engine table counts how many times a keyword appears in a document.

A query to search on one or two keywords is quick to execute (the 
front-end creates thoses queries):

---------------------------------------------------------------------
select count (distinct f.f_id) as results
FROM
fiches f
 INNER JOIN engine e1 INNER JOIN keywords k1 USING (k_id) USING (f_id)
 INNER JOIN engine e2 INNER JOIN keywords k2 USING (k_id) USING (f_id)

WHERE TRUE
AND k1.keyword like 'maintenance%'
AND k2.keyword like 'exploitation%'
;

QUERY PLAN
Aggregate  (cost=3953.00..3953.00 rows=1 width=4) (actual 
time=525.243..525.243 rows=1 loops=1)
  ->  Nested Loop  (cost=1974.79..3952.99 rows=1 width=4) (actual 
time=211.570..513.758 rows=6879 loops=1)
        ->  Hash Join  (cost=1974.79..3949.62 rows=1 width=8) (actual 
time=211.483..389.340 rows=6879 loops=1)
              Hash Cond: ("outer".f_id = "inner".f_id)
              ->  Nested Loop  (cost=0.00..1974.76 rows=11 width=4) 
(actual time=0.132..155.499 rows=9520 loops=1)
                    ->  Index Scan using keyword_pattern_key on keywords 
k2  (cost=0.00..3.51 rows=1 width=4) (actual time=0.078..1.887 rows=75 
loops=1)
                          Index Cond: (((keyword)::text ~>=~ 
'exploitation'::character varying) AND ((keyword)::text ~<~ 
'exploitatioo'::character varying))
                          Filter: ((keyword)::text ~~ 'exploitation%'::text)
                    ->  Index Scan using k_id_key on engine e2  
(cost=0.00..1954.93 rows=1306 width=8) (actual time=0.049..1.842 
rows=127 loops=75)
                          Index Cond: (e2.k_id = "outer".k_id)
              ->  Hash  (cost=1974.76..1974.76 rows=11 width=4) (actual 
time=211.203..211.203 rows=0 loops=1)
                    ->  Nested Loop  (cost=0.00..1974.76 rows=11 
width=4) (actual time=0.296..197.590 rows=11183 loops=1)
                          ->  Index Scan using keyword_pattern_key on 
keywords k1  (cost=0.00..3.51 rows=1 width=4) (actual time=0.189..1.351 
rows=73 loops=1)
                                Index Cond: (((keyword)::text ~>=~ 
'maintenance'::character varying) AND ((keyword)::text ~<~ 
'maintenancf'::character varying))
                                Filter: ((keyword)::text ~~ 
'maintenance%'::text)
                          ->  Index Scan using k_id_key on engine e1  
(cost=0.00..1954.93 rows=1306 width=8) (actual time=0.029..2.406 
rows=153 loops=73)
                                Index Cond: (e1.k_id = "outer".k_id)
        ->  Index Scan using fiches_pkey on fiches f  (cost=0.00..3.36 
rows=1 width=4) (actual time=0.013..0.014 rows=1 loops=6879)
              Index Cond: (f.f_id = "outer".f_id)
Total runtime: 525.511 ms
--------------------------------------------------------------------------


But when there are three keywords or more, the planner chooses to 
perform a very costly nested loop :

--------------------------------------------------------------------------
select count (distinct f.f_id) as results
FROM
fiches f
 INNER JOIN engine e1 INNER JOIN keywords k1 USING (k_id) USING (f_id)
 INNER JOIN engine e2 INNER JOIN keywords k2 USING (k_id) USING (f_id)
 INNER JOIN engine e3 INNER JOIN keywords k3 USING (k_id) USING (f_id)

WHERE TRUE
AND k1.keyword like 'maintenance%'
AND k2.keyword like 'exploitation%'
AND k3.keyword like 'numerique%'
;

QUERY PLAN
Aggregate  (cost=5927.90..5927.90 rows=1 width=4) (actual 
time=673048.168..673048.169 rows=1 loops=1)
  ->  Nested Loop  (cost=1974.79..5927.90 rows=1 width=4) (actual 
time=1853.789..673038.065 rows=2929 loops=1)
        ->  Nested Loop  (cost=1974.79..5924.52 rows=1 width=12) (actual 
time=1853.719..672881.725 rows=2929 loops=1)
              Join Filter: ("inner".f_id = "outer".f_id)
              ->  Hash Join  (cost=1974.79..3949.62 rows=1 width=8) 
(actual time=198.845..441.947 rows=6879 loops=1)
                    Hash Cond: ("outer".f_id = "inner".f_id)
                    ->  Nested Loop  (cost=0.00..1974.76 rows=11 
width=4) (actual time=0.129..199.895 rows=9520 loops=1)
                          ->  Index Scan using keyword_pattern_key on 
keywords k2  (cost=0.00..3.51 rows=1 width=4) (actual time=0.077..1.918 
rows=75 loops=1)
                                Index Cond: (((keyword)::text ~>=~ 
'exploitation'::character varying) AND ((keyword)::text ~<~ 
'exploitatioo'::character varying))
                                Filter: ((keyword)::text ~~ 
'exploitation%'::text)
                          ->  Index Scan using k_id_key on engine e2  
(cost=0.00..1954.93 rows=1306 width=8) (actual time=0.035..2.342 
rows=127 loops=75)
                                Index Cond: (e2.k_id = "outer".k_id)
                    ->  Hash  (cost=1974.76..1974.76 rows=11 width=4) 
(actual time=198.650..198.650 rows=0 loops=1)
                          ->  Nested Loop  (cost=0.00..1974.76 rows=11 
width=4) (actual time=0.174..187.216 rows=11183 loops=1)
                                ->  Index Scan using keyword_pattern_key 
on keywords k1  (cost=0.00..3.51 rows=1 width=4) (actual 
time=0.113..1.222 rows=73 loops=1)
                                      Index Cond: (((keyword)::text ~>=~ 
'maintenance'::character varying) AND ((keyword)::text ~<~ 
'maintenancf'::character varying))
                                      Filter: ((keyword)::text ~~ 
'maintenance%'::text)
                                ->  Index Scan using k_id_key on engine 
e1  (cost=0.00..1954.93 rows=1306 width=8) (actual time=0.029..2.311 
rows=153 loops=73)
                                      Index Cond: (e1.k_id = "outer".k_id)
              ->  Nested Loop  (cost=0.00..1974.76 rows=11 width=4) 
(actual time=0.087..90.165 rows=9553 loops=6879)
                    ->  Index Scan using keyword_pattern_key on keywords 
k3  (cost=0.00..3.51 rows=1 width=4) (actual time=0.049..0.628 rows=49 
loops=6879)
                          Index Cond: (((keyword)::text ~>=~ 
'numerique'::character varying) AND ((keyword)::text ~<~ 
'numeriquf'::character varying))
                          Filter: ((keyword)::text ~~ 'numerique%'::text)
                    ->  Index Scan using k_id_key on engine e3  
(cost=0.00..1954.93 rows=1306 width=8) (actual time=0.023..1.544 
rows=195 loops=337071)
                          Index Cond: (e3.k_id = "outer".k_id)
        ->  Index Scan using fiches_pkey on fiches f  (cost=0.00..3.36 
rows=1 width=4) (actual time=0.041..0.043 rows=1 loops=2929)
              Index Cond: (f.f_id = "outer".f_id)
Total runtime: 673048.405 ms
----------------------------------------------------------------------
More than 10 minutes !

Is there a specific reason the planner chooses this way ?
Can whe do something on the postgresql configuration to avoid this ?
Can whe force the planner to use a hash join as it does for the first 
joins ?

Regards,
Antoine Bajolet



Responses

pgsql-performance by date

Next:From: Thomas F. O'ConnellDate: 2005-09-19 23:00:33
Subject: Index Selection: ORDER BY vs. PRIMARY KEY
Previous:From: Martin NickelDate: 2005-09-17 01:34:14
Subject: How can this be?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group