Skip site navigation (1) Skip section navigation (2)

Re: Slow SQL query (14-15 seconds)

From: Bruno Baguette <bruno(dot)baguette(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow SQL query (14-15 seconds)
Date: 2008-11-13 14:19:42
Message-ID: 491C377E.3070904@gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
Le 13/11/08 14:31, Tom Lane a écrit :
> It's the repeatedly executed EXISTS subplan that's hurting you:
> 
>>                             SubPlan
>>                               ->  Nested Loop  (cost=35.56..378.16 
>> rows=2 width=0) (actual time=16.511..16.511 rows=0 loops=818)
> 
> 16.511 * 818 = 13505.998, so this is all but about 100 msec of the
> runtime.  Can't tell if there's any easy way to improve it.  In
> pre-8.4 releases trying to convert the EXISTS into an IN might help.

Hello Tom !

If I replace the EXISTS by a IN subquery, it falls from 14-15 seconds to 
5 seconds !

####################################
AND EXISTS (
              SELECT 1 FROM commandes
              INNER JOIN clients ON commandes.fk_client_id = 
clients.pk_client_id
              INNER JOIN societes AS societe_client ON 
clients.fk_societe_id = societe_client.pk_societe_id
              WHERE delivery_date_livraison BETWEEN (NOW() - '1 
year'::interval) AND NOW() AND societe_client.pk_societe_id = 
societes.pk_societe_id
            )
####################################

replaced by a IN subquery

####################################
AND societes.pk_societe_id IN (
                                 SELECT societes.pk_societe_id
                                 FROM commandes
                                 INNER JOIN clients ON 
commandes.fk_client_id = clients.pk_client_id
                                 INNER JOIN societes AS societe_client 
ON clients.fk_societe_id = societe_client.pk_societe_id
                                 WHERE delivery_date_livraison BETWEEN 
(NOW() - '1 year'::interval) AND NOW()
                               )
####################################

Heres's the EXPLAIN ANALYZE of the new SQL query :


####################################
 
            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=280995.27..280996.30 rows=414 width=147) (actual 
time=5164.297..5165.638 rows=818 loops=1)
    Sort Key: lower((societes.denomination_commerciale)::text)
    ->  Hash Left Join  (cost=697.38..280977.27 rows=414 width=147) 
(actual time=110.093..5156.853 rows=818 loops=1)
          Hash Cond: ("outer".pk_societe_id = "inner".societe_id)
          ->  Merge Left Join  (cost=642.64..280817.00 rows=414 
width=139) (actual time=98.886..5141.305 rows=818 loops=1)
                Merge Cond: ("outer".pk_societe_id = "inner".societe_id)
                ->  Merge Left Join  (cost=551.92..280717.18 rows=414 
width=131) (actual time=87.278..5123.133 rows=818 loops=1)
                      Merge Cond: ("outer".pk_societe_id = 
"inner".societe_id)
                      ->  Index Scan using pkey_societe_id on societes 
(cost=0.00..280155.54 rows=414 width=123) (actual time=21.748..5051.976 
rows=818 loops=1)
                            Filter: ((NOT is_deleted) AND (subplan))
                            SubPlan
                              ->  Hash Join  (cost=170.88..438.17 
rows=2298 width=0) (actual time=6.165..6.165 rows=1 loops=818)
                                    Hash Cond: ("outer".fk_client_id = 
"inner".pk_client_id)
                                    ->  Bitmap Heap Scan on commandes 
(cost=35.66..266.10 rows=2775 width=8) (actual time=6.144..6.144 rows=1 
loops=818)
                                          Recheck Cond: 
((delivery_date_livraison >= (now() - '1 year'::interval)) AND 
(delivery_date_livraison <= now()))
                                          ->  Bitmap Index Scan on 
idx_date_livraison  (cost=0.00..35.66 rows=2775 width=0) (actual 
time=6.121..6.121 rows=3109 loops=818)
                                                Index Cond: 
((delivery_date_livraison >= (now() - '1 year'::interval)) AND 
(delivery_date_livraison <= now()))
                                    ->  Hash  (cost=132.46..132.46 
rows=1105 width=8) (actual time=13.573..13.573 rows=1082 loops=1)
                                          ->  Hash Join 
(cost=48.39..132.46 rows=1105 width=8) (actual time=3.933..11.246 
rows=1082 loops=1)
                                                Hash Cond: 
("outer".fk_societe_id = "inner".pk_societe_id)
                                                ->  Seq Scan on clients 
  (cost=0.00..66.35 rows=1335 width=16) (actual time=0.004..2.623 
rows=1308 loops=1)
                                                ->  Hash 
(cost=46.11..46.11 rows=911 width=8) (actual time=3.900..3.900 rows=903 
loops=1)
                                                      ->  Seq Scan on 
societes societe_client  (cost=0.00..46.11 rows=911 width=8) (actual 
time=0.004..1.947 rows=903 loops=1)
                      ->  Sort  (cost=551.92..554.20 rows=911 width=16) 
(actual time=65.518..66.453 rows=563 loops=1)
                            Sort Key: stats_commandes.societe_id
                            ->  Subquery Scan stats_commandes 
(cost=486.64..507.14 rows=911 width=16) (actual time=61.034..64.117 
rows=563 loops=1)
                                  ->  HashAggregate 
(cost=486.64..498.03 rows=911 width=16) (actual time=61.028..62.177 
rows=563 loops=1)
                                        ->  Hash Join 
(cost=135.22..458.94 rows=5539 width=16) (actual time=13.517..48.643 
rows=5971 loops=1)
                                              Hash Cond: 
("outer".fk_client_id = "inner".pk_client_id)
                                              ->  Seq Scan on commandes 
  (cost=0.00..234.90 rows=6690 width=16) (actual time=0.004..11.951 
rows=5971 loops=1)
                                              ->  Hash 
(cost=132.46..132.46 rows=1105 width=16) (actual time=13.486..13.486 
rows=1082 loops=1)
                                                    ->  Hash Join 
(cost=48.39..132.46 rows=1105 width=16) (actual time=3.827..11.123 
rows=1082 loops=1)
                                                          Hash Cond: 
("outer".fk_societe_id = "inner".pk_societe_id)
                                                          ->  Seq Scan 
on clients  (cost=0.00..66.35 rows=1335 width=16) (actual 
time=0.003..2.566 rows=1308 loops=1)
                                                          ->  Hash 
(cost=46.11..46.11 rows=911 width=8) (actual time=3.802..3.802 rows=903 
loops=1)
                                                                ->  Seq 
Scan on societes  (cost=0.00..46.11 rows=911 width=8) (actual 
time=0.004..1.906 rows=903 loops=1)
                ->  Sort  (cost=90.72..92.83 rows=844 width=16) (actual 
time=11.566..13.070 rows=885 loops=1)
                      Sort Key: stats_adresses_livraison.societe_id
                      ->  Subquery Scan stats_adresses_livraison 
(cost=30.71..49.70 rows=844 width=16) (actual time=4.504..9.357 rows=885 
loops=1)
                            ->  HashAggregate  (cost=30.71..41.26 
rows=844 width=16) (actual time=4.499..6.304 rows=885 loops=1)
                                  ->  Seq Scan on 
societes_adresses_livraison  (cost=0.00..25.90 rows=962 width=16) 
(actual time=0.005..2.221 rows=991 loops=1)
                                        Filter: (NOT is_deleted)
          ->  Hash  (cost=52.48..52.48 rows=903 width=16) (actual 
time=11.164..11.164 rows=903 loops=1)
                ->  Subquery Scan stats_adresses_facturation 
(cost=32.16..52.48 rows=903 width=16) (actual time=4.339..9.220 rows=903 
loops=1)
                      ->  HashAggregate  (cost=32.16..43.45 rows=903 
width=16) (actual time=4.334..6.116 rows=903 loops=1)
                            ->  Seq Scan on 
societes_adresses_facturation  (cost=0.00..27.25 rows=983 width=16) 
(actual time=0.006..2.128 rows=943 loops=1)
                                  Filter: (NOT is_deleted)
  Total runtime: 5167.896 ms
(48 lignes)

####################################


Many thanks for the help, that's already better (3x time faster) !

Can you explain why a IN is fastest than an EXISTS subquery ? Until now, 
I was thinking that IN would require PostgreSQL to scan all the table 
(from the beginning to the end) and that EXISTS would require to scan 
all the table (from the beginning until getting one match).

Do you think I can improve again the performance of that query ? I 
expected more speed since theses are little tables

delivery=> SELECT COUNT(*) FROM societes;
  count
-------
    903
(1 ligne)

delivery=> SELECT COUNT(*) FROM clients;
  count
-------
   1308
(1 ligne)

delivery=> SELECT COUNT(*) FROM commandes;
  count
-------
   5972
(1 ligne)


One reader told me Gmail was guilty for cutting the lines, so I've put a 
copy of the query plan on pastebin.com to keep it readable : 
<http://pastebin.com/m6434f639>

Thanks in advance for any tips !

Regards,

-- 
Bruno Baguette

In response to

Responses

pgsql-performance by date

Next:From: Vladimir SitnikovDate: 2008-11-13 14:29:10
Subject: Re: Slow SQL query (14-15 seconds)
Previous:From: Tom LaneDate: 2008-11-13 13:31:50
Subject: Re: Slow SQL query (14-15 seconds)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group