Re: Sequential scan instead of index scan

From: Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential scan instead of index scan
Date: 2012-08-06 22:04:10
Message-ID: 50203F5A.3040805@anatec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 06/08/2012 16:34, Tom Lane wrote:
> Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com> writes:
>> I think this is a pretty good plan and quite quick given the
>> size of the table (88Million rows at present). However in real
>> life the parameter where I search for msg_id is not an array of
>> 3 ids but of 300.000 or more. It is then that the query forgets
>> the plan and goes to sequential scan. Is there any way around?
> If you've got that many, any(array[....]) is a bad choice. I'd try
> putting the IDs into a VALUES(...) list, or even a temporary table, and
> then writing the query as a join. It is a serious mistake to think that
> a seqscan is evil when you're dealing with joining that many rows, btw.
> What you should probably be looking for is a hash join plan.
>
> regards, tom lane
Ok in that scenario we are back to square one. Following your suggestion
my resultant query is this (the temporary table is tmp_tbl_messages)
select
ship_pos_messages.*
from
feed_all_y2012m08.ship_pos_messages join tmp_tbl_messages
on (ship_pos_messages.msg_id = tmp_tbl_messages.msg_id)
where
extract('day' from msg_date_rec) = 1
AND date_trunc('day', msg_date_rec) = '2012-08-01';

which gives us the following explain analyse:

"Merge Join (cost=1214220.48..3818359.46 rows=173574357 width=128)
(actual time=465036.958..479089.731 rows=341190 loops=1)"
" Merge Cond: (feed_all_y2012m08.ship_pos_messages.msg_id =
tmp_tbl_messages.msg_id)"
" -> Sort (cost=1178961.70..1179223.51 rows=104725 width=128) (actual
time=464796.971..476579.208 rows=19512873 loops=1)"
" Sort Key: feed_all_y2012m08.ship_pos_messages.msg_id"
" Sort Method: external merge Disk: 1254048kB"
" -> Append (cost=0.00..1170229.60 rows=104725 width=128)
(actual time=0.033..438682.971 rows=19512883 loops=1)"
" -> Seq Scan on ship_pos_messages (cost=0.00..0.00 rows=1
width=100) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: ((date_part('day'::text, msg_date_rec) =
1::double precision) AND (date_trunc('day'::text, msg_date_rec) =
'2012-08-01 00:00:00'::timestamp without time zone))"
" -> Seq Scan on ship_a_pos_messages ship_pos_messages
(cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0
loops=1)"
" Filter: ((date_part('day'::text, msg_date_rec) =
1::double precision) AND (date_trunc('day'::text, msg_date_rec) =
'2012-08-01 00:00:00'::timestamp without time zone))"
" -> Index Scan using
idx_ship_b_std_pos_messages_date_trunc on ship_b_std_pos_messages
ship_pos_messages (cost=0.00..58657.09 rows=5269 width=128) (actual
time=0.032..799.171 rows=986344 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec)
= '2012-08-01 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) =
1::double precision)"
" -> Index Scan using
idx_ship_b_ext_pos_messages_date_trunc on ship_b_ext_pos_messages
ship_pos_messages (cost=0.00..1694.64 rows=141 width=128) (actual
time=0.026..20.661 rows=26979 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec)
= '2012-08-01 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) =
1::double precision)"
" -> Index Scan using
idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0
ship_pos_messages (cost=0.00..1109877.86 rows=99313 width=128) (actual
time=0.029..435784.376 rows=18499560 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec)
= '2012-08-01 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) =
1::double precision)"
" -> Sort (cost=35258.79..36087.50 rows=331486 width=8) (actual
time=239.908..307.576 rows=349984 loops=1)"
" Sort Key: tmp_tbl_messages.msg_id"
" Sort Method: quicksort Memory: 28694kB"
" -> Seq Scan on tmp_tbl_messages (cost=0.00..4863.86
rows=331486 width=8) (actual time=0.047..55.227 rows=349984 loops=1)"
"Total runtime: 479336.869 ms"

Which is a Merge join and not a hash. Any ideas how to make it a hash join?

Kind Regards
Yiannis

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-08-07 03:13:27 Re: Sequential scan instead of index scan
Previous Message Midge Brown 2012-08-06 18:36:43 Re: slow query, different plans