From: | Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Sequential scan instead of index scan |
Date: | 2012-08-06 22:04:10 |
Message-ID: | 50203F5A.3040805@anatec.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 06/08/2012 16:34, Tom Lane wrote:
> Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com> writes:
>> I think this is a pretty good plan and quite quick given the
>> size of the table (88Million rows at present). However in real
>> life the parameter where I search for msg_id is not an array of
>> 3 ids but of 300.000 or more. It is then that the query forgets
>> the plan and goes to sequential scan. Is there any way around?
> If you've got that many, any(array[....]) is a bad choice. I'd try
> putting the IDs into a VALUES(...) list, or even a temporary table, and
> then writing the query as a join. It is a serious mistake to think that
> a seqscan is evil when you're dealing with joining that many rows, btw.
> What you should probably be looking for is a hash join plan.
>
> regards, tom lane
Ok in that scenario we are back to square one. Following your suggestion
my resultant query is this (the temporary table is tmp_tbl_messages)
select
ship_pos_messages.*
from
feed_all_y2012m08.ship_pos_messages join tmp_tbl_messages
on (ship_pos_messages.msg_id = tmp_tbl_messages.msg_id)
where
extract('day' from msg_date_rec) = 1
AND date_trunc('day', msg_date_rec) = '2012-08-01';
which gives us the following explain analyse:
"Merge Join (cost=1214220.48..3818359.46 rows=173574357 width=128)
(actual time=465036.958..479089.731 rows=341190 loops=1)"
" Merge Cond: (feed_all_y2012m08.ship_pos_messages.msg_id =
tmp_tbl_messages.msg_id)"
" -> Sort (cost=1178961.70..1179223.51 rows=104725 width=128) (actual
time=464796.971..476579.208 rows=19512873 loops=1)"
" Sort Key: feed_all_y2012m08.ship_pos_messages.msg_id"
" Sort Method: external merge Disk: 1254048kB"
" -> Append (cost=0.00..1170229.60 rows=104725 width=128)
(actual time=0.033..438682.971 rows=19512883 loops=1)"
" -> Seq Scan on ship_pos_messages (cost=0.00..0.00 rows=1
width=100) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: ((date_part('day'::text, msg_date_rec) =
1::double precision) AND (date_trunc('day'::text, msg_date_rec) =
'2012-08-01 00:00:00'::timestamp without time zone))"
" -> Seq Scan on ship_a_pos_messages ship_pos_messages
(cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0
loops=1)"
" Filter: ((date_part('day'::text, msg_date_rec) =
1::double precision) AND (date_trunc('day'::text, msg_date_rec) =
'2012-08-01 00:00:00'::timestamp without time zone))"
" -> Index Scan using
idx_ship_b_std_pos_messages_date_trunc on ship_b_std_pos_messages
ship_pos_messages (cost=0.00..58657.09 rows=5269 width=128) (actual
time=0.032..799.171 rows=986344 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec)
= '2012-08-01 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) =
1::double precision)"
" -> Index Scan using
idx_ship_b_ext_pos_messages_date_trunc on ship_b_ext_pos_messages
ship_pos_messages (cost=0.00..1694.64 rows=141 width=128) (actual
time=0.026..20.661 rows=26979 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec)
= '2012-08-01 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) =
1::double precision)"
" -> Index Scan using
idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0
ship_pos_messages (cost=0.00..1109877.86 rows=99313 width=128) (actual
time=0.029..435784.376 rows=18499560 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec)
= '2012-08-01 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) =
1::double precision)"
" -> Sort (cost=35258.79..36087.50 rows=331486 width=8) (actual
time=239.908..307.576 rows=349984 loops=1)"
" Sort Key: tmp_tbl_messages.msg_id"
" Sort Method: quicksort Memory: 28694kB"
" -> Seq Scan on tmp_tbl_messages (cost=0.00..4863.86
rows=331486 width=8) (actual time=0.047..55.227 rows=349984 loops=1)"
"Total runtime: 479336.869 ms"
Which is a Merge join and not a hash. Any ideas how to make it a hash join?
Kind Regards
Yiannis
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-08-07 03:13:27 | Re: Sequential scan instead of index scan |
Previous Message | Midge Brown | 2012-08-06 18:36:43 | Re: slow query, different plans |