query faster using LEFT OUTER join?

From: Drew <drewmwilson(at)fastmail(dot)fm>
To: pgsql-sql(at)postgresql(dot)org
Subject: query faster using LEFT OUTER join?
Date: 2006-11-17 21:16:03
Message-ID: B8825A6D-229D-4130-9091-7386F2B383DD@fastmail.fm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I have a planner question about a multi-join query, where if I
rewrite the query excluding the table that is left joined to the
other tables my query speed increases by 75%.

I've tested this query on both 8.0.9 and 8.2b3, w/ similar results
(except 8.2b3 is 50% faster!)

I'm wondering why the first query is planned a faster way and if
there is anything I could do to my 2nd query faster (which should be
faster because it doesn't have to scan the first table).

Here's my initial query that runs in 22.961ms:
SELECT
train_family_id, object_type, COUNT(*) AS frequency
FROM
translation_pair tp
LEFT OUTER JOIN instance i USING(translation_pair_id)
JOIN context c USING(context_id)
JOIN loc_submission ls USING(loc_submission_id)
JOIN loc_submission_train_info lsti USING(loc_submission_id)
WHERE
translation_pair_id IN (640352, 6144, 1023028, 18155, 240244,
50157)
GROUP BY
train_family_id,
object_type;

If I just remove the translation_pair table and join directly against
the instance table, the query speed drops 50% to 36.6ms
SELECT
train_family_id, object_type, COUNT(*) AS frequency
FROM
instance i
JOIN context c USING(context_id)
JOIN loc_submission ls USING(loc_submission_id)
JOIN loc_submission_train_info lsti USING(loc_submission_id)
WHERE
translation_pair_id IN (640352, 6144, 1023028, 18155, 240244,
50157)
GROUP BY
train_family_id,
object_type;

Looking at the query plans for the 2 of them, it looks like the inner
"Bitmap Index Scan on instance_translation_pair_id" is much slower
when using "IN (x,y,z)" instead of comparing against table
(tp.translation_pair_id = i.translation_pair_id.

Is there anything else I should notice? Can I force the planner to
use the translation_pair table first?

Here's the plan for the first query (using 8.2b3)
------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------------------------
HashAggregate (cost=3223.17..3223.41 rows=19 width=18) (actual
time=34.913..34.978 rows=122 loops=1)
-> Nested Loop Left Join (cost=29.53..3223.03 rows=19 width=18)
(actual time=0.606..33.994 rows=770 loops=1)
-> Nested Loop Left Join (cost=29.53..3108.80 rows=19
width=22) (actual time=0.552..26.918 rows=770 loops=1)
-> Nested Loop Left Join (cost=29.53..2995.26
rows=19 width=18) (actual time=0.497..19.924 rows=768 loops=1)
-> Nested Loop Left Join (cost=29.53..2879.59
rows=19 width=8) (actual time=0.431..6.395 rows=768 loops=1)
-> Bitmap Heap Scan on translation_pair
tp (cost=24.27..48.10 rows=6 width=4) (actual time=0.304..0.376
rows=6 loops=1)
Recheck Cond: (translation_pair_id
= ANY ('{640352,6144,1023028,18155,240244,50157}'::oid[]))
-> Bitmap Index Scan on
translation_pair_pkey (cost=0.00..24.27 rows=6 width=0) (actual
time=0.254..0.254 rows=6 loops=1)
Index Cond:
(translation_pair_id = ANY
('{640352,6144,1023028,18155,240244,50157}'::oid[]))
-> Bitmap Heap Scan on instance i
(cost=5.26..470.38 rows=123 width=12) (actual time=0.119..0.922
rows=128 loops=6)
Recheck Cond:
(tp.translation_pair_id = i.translation_pair_id)
-> Bitmap Index Scan on
instance_translation_pair_id (cost=0.00..5.26 rows=123 width=0)
(actual time=0.098..0.098 rows=128 loops=6)
Index Cond:
(tp.translation_pair_id = i.translation_pair_id)
-> Index Scan using context_pkey on context c
(cost=0.00..6.07 rows=1 width=18) (actual time=0.015..0.016 rows=1
loops=768)
Index Cond: (i.context_id = c.context_id)
-> Index Scan using loc_submission_train_info_pkey
on loc_submission_train_info lsti (cost=0.00..5.96 rows=1 width=8)
(actual time=0.007..0.008 rows=1 loops=768)
Index Cond: (i.loc_submission_id =
lsti.loc_submission_id)
-> Index Scan using loc_submission_pkey on loc_submission
ls (cost=0.00..6.00 rows=1 width=4) (actual time=0.007..0.008 rows=1
loops=770)
Index Cond: (i.loc_submission_id = ls.loc_submission_id)

And here's the plan for the second query without the first LEFT OUTER
join (using 8.2b3)
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------
HashAggregate (cost=8927.87..8936.89 rows=721 width=18) (actual
time=39.484..39.551 rows=122 loops=1)
-> Nested Loop Left Join (cost=3706.71..8922.47 rows=721
width=18) (actual time=25.246..38.654 rows=770 loops=1)
-> Hash Left Join (cost=3706.71..4000.93 rows=721
width=8) (actual time=25.215..31.169 rows=770 loops=1)
Hash Cond: (i.loc_submission_id = ls.loc_submission_id)
-> Merge Left Join (cost=3420.14..3481.22 rows=721
width=12) (actual time=15.088..20.018 rows=770 loops=1)
Merge Cond: (i.loc_submission_id =
lsti.loc_submission_id)
-> Sort (cost=2576.82..2578.62 rows=720
width=8) (actual time=2.287..2.438 rows=768 loops=1)
Sort Key: i.loc_submission_id
-> Bitmap Heap Scan on instance i
(cost=25.24..2542.65 rows=720 width=8) (actual time=0.381..1.623
rows=768 loops=1)
Recheck Cond: (translation_pair_id
= ANY ('{640352,6144,1023028,18155,240244,50157}'::oid[]))
-> Bitmap Index Scan on
instance_translation_pair_id (cost=0.00..25.24 rows=720 width=0)
(actual time=0.293..0.293 rows=768 loops=1)
Index Cond:
(translation_pair_id = ANY
('{640352,6144,1023028,18155,240244,50157}'::oid[]))
-> Sort (cost=843.32..868.47 rows=10059
width=8) (actual time=12.782..14.312 rows=10530 loops=1)
Sort Key: lsti.loc_submission_id
-> Seq Scan on loc_submission_train_info
lsti (cost=0.00..174.59 rows=10059 width=8) (actual
time=0.012..3.708 rows=10059 loops=1)
-> Hash (cost=261.46..261.46 rows=10046 width=4)
(actual time=10.061..10.061 rows=10046 loops=1)
-> Seq Scan on loc_submission ls
(cost=0.00..261.46 rows=10046 width=4) (actual time=0.013..5.235
rows=10046 loops=1)
-> Index Scan using context_pkey on context c
(cost=0.00..6.81 rows=1 width=18) (actual time=0.008..0.009 rows=1
loops=770)
Index Cond: (i.context_id = c.context_id)

Thanks for your help,

Drew

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Drew 2006-11-17 21:17:05 Re: How convert UNICODE
Previous Message lms 2006-11-17 17:09:37 How convert UNICODE