Skip site navigation (1) Skip section navigation (2)

Re: Cost Issue - How do I force a Hash Join

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Cost Issue - How do I force a Hash Join
Date: 2006-02-21 05:54:14
Message-ID: 43FAAB06.5030901@modgraph-usa.com (view raw or flat)
Thread:
Lists: pgsql-performance
"Virag Saksena" <virag(at)auptyma(dot)com> writes:
>> The individual queries run in 50-300 ms. However the optimizer is
>> choosing a nested loop to join them rather than a Hash join...

I have what appears to be the identical problem.

This is a straightforward query that should be fairly quick, but takes about 30 minutes.  It's a query across three tables, call them A, B, and C.  The tables are joined on indexed columns.

Here's a quick summary:

   Table A -----> Table B -----> Table C
     A_ID           B_ID           C_ID
                    A_ID           NAME
                    C_ID

Tables A and B have 6 million rows each.  Table C is small: 67 names, no repeats.  All columns involved in the join are indexed.

Summary: 
   1. Query B only:    2.7 seconds, 302175 rows returned
   2. Join B and C:    4.3 seconds, exact same answer
   3. Join A and B:    7.2 minutes, exact same answer
   4. Join A, B, C:    32.7 minutes, exact same answer

Looking at these:

   Query #1 is doing the real work: finding the rows of interest.

   Queries #1 and #2 ought to be virtually identical, since Table C has
   just one row with C_ID = 9, but the time almost doubles.

   Query #3 should take a bit longer than Query #1 because it has to join
   300K rows, but the indexes should make this take just a few seconds,
   certainly well under a minute.  

   Query #4 should be identical to Query #3, again because there's only
   one row in Table C.  32 minutes is pretty horrible for such a
   straightforward query.

It looks to me like the problem is the use of nested loops when a hash join should be used, but I'm no expert at query planning.

This is psql 8.0.3.  Table definitions are at the end.  (Table and column names are altered to protect the guilty, otherwise these are straight from Postgres.)  I ran "vacuum full analyze" after the last data were added.  Hardware is a Dell, 2-CPU Xeon, 4 GB memory, database is on a single SATA 7200RPM disk.

Thanks,
Craig

----------------------------


QUERY #1:
---------

explain analyze select B.A_ID from B where B.B_ID = 9;

 Index Scan using i_B_B_ID on B  (cost=0.00..154401.36 rows=131236 width=4) (actual time=0.158..1387.251 rows=302175 loops=1)
   Index Cond: (B_ID = 9)
 Total runtime: 2344.053 ms


QUERY #2:
---------

explain analyze select B.A_ID from B join C on (B.C_ID = C.C_ID) where C.name = 'Joe';

 Nested Loop  (cost=0.00..258501.92 rows=177741 width=4) (actual time=0.349..3392.532 rows=302175 loops=1)
   ->  Seq Scan on C  (cost=0.00..12.90 rows=1 width=4) (actual time=0.232..0.336 rows=1 loops=1)
         Filter: ((name)::text = 'Joe'::text)
   ->  Index Scan using i_B_C_ID on B  (cost=0.00..254387.31 rows=328137 width=8) (actual time=0.102..1290.002 rows=302175 loops=1)
         Index Cond: (B.C_ID = "outer".C_ID)
 Total runtime: 4373.916 ms


QUERY #3:
---------

explain analyze
   select A.A_ID from A
     join B on (A.A_ID = B.A_ID) 
    where B.B_ID = 9;

 Nested Loop  (cost=0.00..711336.41 rows=131236 width=4) (actual time=37.118..429419.347 rows=302175 loops=1)
   ->  Index Scan using i_B_B_ID on B  (cost=0.00..154401.36 rows=131236 width=4) (actual time=27.344..8858.489 rows=302175 loops=1)
         Index Cond: (B_ID = 9)
   ->  Index Scan using pk_A_test on A  (cost=0.00..4.23 rows=1 width=4) (actual time=1.372..1.376 rows=1 loops=302175)
         Index Cond: (A.A_ID = "outer".A_ID)
 Total runtime: 430467.686 ms


QUERY #4:
---------
explain analyze
   select A.A_ID from A
     join B on (A.A_ID = B.A_ID)
     join C on (B.B_ID = C.B_ID)
     where C.name = 'Joe';

 Nested Loop  (cost=0.00..1012793.38 rows=177741 width=4) (actual time=70.184..1960112.247 rows=302175 loops=1)
   ->  Nested Loop  (cost=0.00..258501.92 rows=177741 width=4) (actual time=52.114..17753.638 rows=302175 loops=1)
         ->  Seq Scan on C  (cost=0.00..12.90 rows=1 width=4) (actual time=0.109..0.176 rows=1 loops=1)
               Filter: ((name)::text = 'Joe'::text)
         ->  Index Scan using i_B_B_ID on B  (cost=0.00..254387.31 rows=328137 width=8) (actual time=51.985..15566.896 rows=302175 loops=1)
               Index Cond: (B.B_ID = "outer".B_ID)
   ->  Index Scan using pk_A_test on A  (cost=0.00..4.23 rows=1 width=4) (actual time=6.407..6.412 rows=1 loops=302175)
         Index Cond: (A.A_ID = "outer".A_ID)
 Total runtime: 1961200.079 ms


TABLE DEFINITIONS:
------------------

xxx => \d a
                Table "xxx.a"
      Column       |          Type          | Modifiers 
-------------------+------------------------+-----------
 a_id              | integer                | not null
 ... more columns

Indexes:
    "pk_a_id" PRIMARY KEY, btree (a_id)
    ... more indexes on other columns

xxx => \d b
                    Table "xxx.b"
          Column          |          Type          | Modifiers 
--------------------------+------------------------+-----------
 b_id                     | integer                | not null
 a_id                     | integer                | not null
 c_id                     | integer                | not null
 ... more columns

Indexes:
    "b_pkey" PRIMARY KEY, btree (b_id)
    "i_b_a_id" btree (a_id)
    "i_b_c_id" btree (c_id)


xxx=> \d c
          Table "xxx.c"
    Column     |          Type          | Modifiers 
---------------+------------------------+-----------
 c_id          | integer                | not null
 name          | character varying(200) | 
 ... more columns
Indexes:
    "c_pkey" PRIMARY KEY, btree (c_id)

In response to

pgsql-performance by date

Next:From: Virag SaksenaDate: 2006-02-21 06:33:39
Subject: Re: Cost Issue - How do I force a Hash Join
Previous:From: Tom LaneDate: 2006-02-21 05:35:55
Subject: Re: Cost Issue - How do I force a Hash Join

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group