Poor performance o

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Poor performance o
Date: 2006-03-21 22:40:17
Message-ID: 442080D1.5090803@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

I'm reposting this -- I sent this out a month ago but never got a response, and hope someone can shed some light on this.

Thanks,
Craig

--------------------------

This is a straightforward query that should be fairly quick, but takes about 30 minutes. It's a query across three tables, call them A, B, and C. The tables are joined on indexed columns.

Here's a quick summary:

Table A -----> Table B -----> Table C
A_ID B_ID C_ID
A_ID NAME
C_ID

Tables A and B have 6 million rows each. Table C is small: 67 names, no repeats. All columns involved in the join are indexed. The database has been full-vacuumed and analyzed.

Summary:

1. Query B only: 2.7 seconds, 302175 rows returned
2. Join B and C: 4.3 seconds, exact same answer
3. Join A and B: 7.2 minutes, exact same answer
4. Join A, B, C: 32.7 minutes, exact same answer

Looking at these:

Query #1 is doing the real work: finding the rows of interest.

Queries #1 and #2 ought to be virtually identical, since Table C has
just one row with C_ID = 9, but the time almost doubles.

Query #3 should take a bit longer than Query #1 because it has to join
300K rows, but the indexes should make this take just a few seconds,
certainly well under a minute.

Query #4 should be identical to Query #3, again because there's only
one row in Table C. 32 minutes is pretty horrible for such a
straightforward query.

It looks to me like the problem is the use of nested loops when a hash join should be used, but I'm no expert at query planning.

This is psql 8.0.3. Table definitions are at the end. Hardware is a Dell, 2-CPU Xeon, 4 GB memory, database is on a single SATA 7200RPM disk.

These table and column names are altered to protect the guilty, otherwise these are straight from Postgres.

QUERY #1:
---------

explain analyze select B.A_ID from B where B.B_ID = 9;

Index Scan using i_B_B_ID on B (cost=0.00..154401.36 rows=131236 width=4) (actual time=0.158..1387.251 rows=302175 loops=1)
Index Cond: (B_ID = 9)
Total runtime: 2344.053 ms

QUERY #2:
---------

explain analyze select B.A_ID from B join C on (B.C_ID = C.C_ID) where C.name = 'Joe';

Nested Loop (cost=0.00..258501.92 rows=177741 width=4) (actual time=0.349..3392.532 rows=302175 loops=1)
-> Seq Scan on C (cost=0.00..12.90 rows=1 width=4) (actual time=0.232..0.336 rows=1 loops=1)
Filter: ((name)::text = 'Joe'::text)
-> Index Scan using i_B_C_ID on B (cost=0.00..254387.31 rows=328137 width=8) (actual time=0.102..1290.002 rows=302175 loops=1)
Index Cond: (B.C_ID = "outer".C_ID)
Total runtime: 4373.916 ms

QUERY #3:
---------

explain analyze
select A.A_ID from A
join B on (A.A_ID = B.A_ID) where B.B_ID = 9;

Nested Loop (cost=0.00..711336.41 rows=131236 width=4) (actual time=37.118..429419.347 rows=302175 loops=1)
-> Index Scan using i_B_B_ID on B (cost=0.00..154401.36 rows=131236 width=4) (actual time=27.344..8858.489 rows=302175 loops=1)
Index Cond: (B_ID = 9)
-> Index Scan using pk_A_test on A (cost=0.00..4.23 rows=1 width=4) (actual time=1.372..1.376 rows=1 loops=302175)
Index Cond: (A.A_ID = "outer".A_ID)
Total runtime: 430467.686 ms

QUERY #4:
---------
explain analyze
select A.A_ID from A
join B on (A.A_ID = B.A_ID)
join C on (B.B_ID = C.B_ID)
where C.name = 'Joe';

Nested Loop (cost=0.00..1012793.38 rows=177741 width=4) (actual time=70.184..1960112.247 rows=302175 loops=1)
-> Nested Loop (cost=0.00..258501.92 rows=177741 width=4) (actual time=52.114..17753.638 rows=302175 loops=1)
-> Seq Scan on C (cost=0.00..12.90 rows=1 width=4) (actual time=0.109..0.176 rows=1 loops=1)
Filter: ((name)::text = 'Joe'::text)
-> Index Scan using i_B_B_ID on B (cost=0.00..254387.31 rows=328137 width=8) (actual time=51.985..15566.896 rows=302175 loops=1)
Index Cond: (B.B_ID = "outer".B_ID)
-> Index Scan using pk_A_test on A (cost=0.00..4.23 rows=1 width=4) (actual time=6.407..6.412 rows=1 loops=302175)
Index Cond: (A.A_ID = "outer".A_ID)
Total runtime: 1961200.079 ms

TABLE DEFINITIONS:
------------------

xxx => \d a
Table "xxx.a"
Column | Type | Modifiers
------------------+------------------------+-----------
a_id | integer | not null
... more columns

Indexes:
"pk_a_id" PRIMARY KEY, btree (a_id)
... more indexes on other columns

xxx => \d b
Table "xxx.b"
Column | Type | Modifiers
-------------------------+------------------------+-----------
b_id | integer | not null
a_id | integer | not null
c_id | integer | not null
... more columns

Indexes:
"b_pkey" PRIMARY KEY, btree (b_id)
"i_b_a_id" btree (a_id)
"i_b_c_id" btree (c_id)

xxx=> \d c
Table "xxx.c"
Column | Type | Modifiers
--------------+------------------------+-----------
c_id | integer | not null
name | character varying(200) |
... more columns

Indexes:
"c_pkey" PRIMARY KEY, btree (c_id)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-03-21 22:44:22 Re: [GENERAL] A real currency type
Previous Message Martijn van Oosterhout 2006-03-21 22:28:23 Re: Automatically setting work_mem

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-03-21 23:33:15 Re: Poor performance o
Previous Message Scott Marlowe 2006-03-21 22:23:14 Re: Migration study, step 1: bulk write performance