slow joining very large table to smaller ones

From: Dan Harris <fbsd(at)drivefaster(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: slow joining very large table to smaller ones
Date: 2005-07-14 08:05:23
Message-ID: 46AF8F35-B735-4DFA-BA0A-AC82B5DA3CCE@drivefaster.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm trying to improve the speed of this query:

explain select recordtext from eventactivity inner join ( select
incidentid from k_r where id = 94 ) a using ( incidentid ) inner join
( select incidentid from k_b where id = 107 ) b using ( incidentid );
QUERY PLAN
------------------------------------------------------------------------
--------------------------------------
Merge Join (cost=2747.29..4249364.96 rows=11968693 width=35)
Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
-> Merge Join (cost=1349.56..4230052.73 rows=4413563 width=117)
Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
-> Index Scan using eventactivity1 on eventactivity
(cost=0.00..4051200.28 rows=44519781 width=49)
-> Sort (cost=1349.56..1350.85 rows=517 width=68)
Sort Key: (k_b.incidentid)::text
-> Index Scan using k_b_idx on k_b
(cost=0.00..1326.26 rows=517 width=68)
Index Cond: (id = 107)
-> Sort (cost=1397.73..1399.09 rows=542 width=68)
Sort Key: (k_r.incidentid)::text
-> Index Scan using k_r_idx on k_r (cost=0.00..1373.12
rows=542 width=68)
Index Cond: (id = 94)
(13 rows)

There are many millions of rows in eventactivity. There are a few
ten-thousand rows in k_r and k_b. There is an index on 'incidentid'
in all three tables. There should only be less than 100 rows matched
in k_r and k_b total. That part on its own is very very fast. But,
it should have those 100 or so incidentids extracted in under a
second and then go into eventactivity AFTER doing that. At least,
that's my intention to make this fast.

Right now, it looks like pg is trying to sort the entire
eventactivity table for the merge join which is taking several
minutes to do. Can I rephrase this so that it does the searching
through k_r and k_b FIRST and then go into eventactivity using the
index on incidentid? It seems like that shouldn't be too hard to
make fast but my SQL query skills are only average.

Thanks
-Dan

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message mudfoot 2005-07-14 09:56:33 Re: JFS fastest filesystem for PostgreSQL?
Previous Message Dawid Kuroczko 2005-07-14 08:03:10 Re: JFS fastest filesystem for PostgreSQL?