Why would this slow the query down so much?

From: Stuart Grimshaw <nospam(at)smgsystems(dot)co(dot)uk>
To: pgsql-sql(at)postgresql(dot)org
Subject: Why would this slow the query down so much?
Date: 2001-10-10 12:35:27
Message-ID: xgXw7.2205$uU2.85101@wards
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I have 3 tables that I am trying to join together:

------------------------------------------
Table "caturljoin"
Attribute | Type |
-----------+---------+
category | integer |
url | integer |
Index: caturljoin_url
caturljoin_cat

Table "stories"
Attribute | Type |
-------------+------------------------+
urn | integer |
headline | character varying |
author | character varying |
source | integer |
story | text |
added | date |
description | character varying |
displayall | smallint |
fullurl | character varying(255) |
publish | smallint |
error | integer |
sourceurl | character varying(255) |
sourcename | character varying(100) |
rank | smallint |
Indices: stories_added,
stories_source,
stories_unique_story,
stories_urn_key

Table "urllist"
Attribute | Type |
--------------+------------------------+
urn | integer |
url | character varying(255) |
friendlyname | character varying(30) |
homepage | character varying(255) |
method | smallint |
script | character varying(20) |
params | character varying(500) |
collect | smallint |
section | smallint |
index_script | character varying |
regexp | character varying(100) |
baseurl | character varying(75) |
Index: urllist_urn
------------------------------------------

With the following SQL:

------------------------------------------
SELECT a.category, b.headline, b.added, c.friendlyname
FROM caturljoin as a
INNER JOIN stories as b ON (a.url = b.source)
INNER JOIN urllist as c ON (a.url = d.urn)
WHERE a.category = 93 ORDER BY b.added DESC LIMIT 1;
------------------------------------------

The results of explain for the above are:

------------------------------------------
psql:scratch.sql:5: NOTICE: QUERY PLAN:

Limit (cost=1587.30..1587.30 rows=1 width=44)
-> Sort (cost=1587.30..1587.30 rows=1 width=44)
-> Merge Join (cost=249.89..1587.29 rows=1 width=44)
-> Sort (cost=249.89..249.89 rows=409 width=28)
-> Nested Loop (cost=0.00..232.15 rows=409 width=28)
-> Index Scan using caturljoin_cat on caturljoin
a (cost=0.00..5.09 rows=7 width=8)
-> Index Scan using stories_source on stories b
(cost=0.00..34.41 rows=29 width=20)
-> Index Scan using urllist_urn on urllist c
(cost=0.00..1323.69 rows=505 width=16)

EXPLAIN
------------------------------------------

and as you might be able to guess the query takes an age to complete.

If I remove the table urllist from the query, I get a much better response:

------------------------------------------
psql:scratch.sql:4: NOTICE: QUERY PLAN:

Limit (cost=0.00..207.74 rows=1 width=28)
-> Nested Loop (cost=0.00..84945.18 rows=409 width=28)
-> Index Scan Backward using stories_added on stories b
(cost=0.00..2310.04 rows=16149 width=20)
-> Index Scan using caturljoin_url on caturljoin a
(cost=0.00..5.10 rows=1 width=8)

EXPLAIN
------------------------------------------

Currently the tables contain the following rows of data:

------------------------------------------
caturljoin: 653 rows
urllist: 505 rows
stories: 21554 rows
------------------------------------------

Can anyone tell me why the inclusion of urllist would slow it down so much,
and what can I do to improve the speed of the query?

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Gerardo Perosio 2001-10-10 13:06:01 SELECT FOR UPDATE CLAUSE
Previous Message Allan Engelhardt 2001-10-09 23:16:42 Re: ROUND function ??