Skip site navigation (1) Skip section navigation (2)

Re: Sequential scan instead of index scan

From: Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential scan instead of index scan
Date: 2012-08-06 15:24:57
Message-ID: 501FE1C9.6010209@anatec.com (view raw or flat)
Thread:
Lists: pgsql-performance
They are random as the data are coming from multiple threads that are 
inserting in the database. I see what you say about "linking them", and 
I may give it a try with the date. The other think that "links" them 
together is the 4 georef fields, however at that stage I am trying to 
collect statistics on the georefs population of "msg_id" so I don't know 
before hand the values to limit my query on them... Do you think an 
index on "date, msg_id" might do something?

Yiannis

On 06/08/2012 16:16, David Barton wrote:
> Hi Yiannis,
>
> Is there anything linking these ids together, or are the relatively 
> random?  If they are relatively random, the rows are likely to be 
> sprinkled amongst many blocks and so a seq scan is the fastest.  I've 
> seen similar problems with indexed queries in a multi-tennant database 
> where the data is so fragmented that once the record volume hits a 
> certain threshold, Postgres decides to table scan rather than use an 
> index.
>
> The query optimiser is unlikely to be able to determine the disk 
> locality of 300k rows and so it just takes a punt on a seq scan.
>
> If you added another filter condition on something indexed e.g. last 
> week or last month or location or something, you might do better if 
> the data does exhibit disk locality.  If the data really is scattered, 
> then a seq scan really will be quicker.
>
> Regards, David
>
> On 06/08/12 23:08, Ioannis Anagnostopoulos wrote:
>> Hi, my query is very simple:
>>
>> select
>>             msg_id,
>>             msg_type,
>>             ship_pos_messages.pos_georef1,
>>             ship_pos_messages.pos_georef2,
>>             ship_pos_messages.pos_georef3,
>>             ship_pos_messages.pos_georef4,
>>             obj_id,
>>             ship_speed,
>>             ship_heading,
>>             ship_course,
>>             pos_point
>>         from
>>             feed_all_y2012m08.ship_pos_messages
>>         where
>>             extract('day' from msg_date_rec) = 1
>>             AND msg_id = any(ARRAY[7294724,14174174,22254408]);
>>
>> The msg_id is the pkey on the ship_pos_messages table and in this 
>> example it is working fast as it uses the pkey (primary key index) to 
>> make the selection. The expplain anayze follows:
>> "Result  (cost=0.00..86.16 rows=5 width=117) (actual 
>> time=128.734..163.319 rows=3 loops=1)"
>> "  ->  Append  (cost=0.00..86.16 rows=5 width=117) (actual 
>> time=128.732..163.315 rows=3 loops=1)"
>> "        ->  Seq Scan on ship_pos_messages (cost=0.00..0.00 rows=1 
>> width=100) (actual time=0.001..0.001 rows=0 loops=1)"
>> "              Filter: ((msg_id = ANY 
>> ('{7294724,14174174,22254408}'::integer[])) AND 
>> (date_part('day'::text, msg_date_rec) = 1::double precision))"
>> "        ->  Seq Scan on ship_a_pos_messages ship_pos_messages  
>> (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 
>> loops=1)"
>> "              Filter: ((msg_id = ANY 
>> ('{7294724,14174174,22254408}'::integer[])) AND 
>> (date_part('day'::text, msg_date_rec) = 1::double precision))"
>> "        ->  Bitmap Heap Scan on ship_b_std_pos_messages 
>> ship_pos_messages  (cost=13.41..25.42 rows=1 width=128) (actual 
>> time=49.127..49.127 rows=0 loops=1)"
>> "              Recheck Cond: (msg_id = ANY 
>> ('{7294724,14174174,22254408}'::integer[]))"
>> "              Filter: (date_part('day'::text, msg_date_rec) = 
>> 1::double precision)"
>> "              ->  Bitmap Index Scan on ship_b_std_pos_messages_pkey  
>> (cost=0.00..13.41 rows=3 width=0) (actual time=49.125..49.125 rows=0 
>> loops=1)"
>> "                    Index Cond: (msg_id = ANY 
>> ('{7294724,14174174,22254408}'::integer[]))"
>> "        ->  Bitmap Heap Scan on ship_b_ext_pos_messages 
>> ship_pos_messages  (cost=12.80..24.62 rows=1 width=128) (actual 
>> time=0.029..0.029 rows=0 loops=1)"
>> "              Recheck Cond: (msg_id = ANY 
>> ('{7294724,14174174,22254408}'::integer[]))"
>> "              Filter: (date_part('day'::text, msg_date_rec) = 
>> 1::double precision)"
>> "              ->  Bitmap Index Scan on ship_b_ext_pos_messages_pkey  
>> (cost=0.00..12.80 rows=3 width=0) (actual time=0.027..0.027 rows=0 
>> loops=1)"
>> "                    Index Cond: (msg_id = ANY 
>> ('{7294724,14174174,22254408}'::integer[]))"
>> "        ->  Bitmap Heap Scan on ship_a_pos_messages_wk0 
>> ship_pos_messages  (cost=24.08..36.12 rows=1 width=128) (actual 
>> time=79.572..114.152 rows=3 loops=1)"
>> "              Recheck Cond: (msg_id = ANY 
>> ('{7294724,14174174,22254408}'::integer[]))"
>> "              Filter: (date_part('day'::text, msg_date_rec) = 
>> 1::double precision)"
>> "              ->  Bitmap Index Scan on ship_a_pos_messages_wk0_pkey  
>> (cost=0.00..24.08 rows=3 width=0) (actual time=67.441..67.441 rows=3 
>> loops=1)"
>> "                    Index Cond: (msg_id = ANY 
>> ('{7294724,14174174,22254408}'::integer[]))"
>> "Total runtime: 180.146 ms"
>>
>> I think this is a pretty good plan and quite quick given the size of 
>> the table (88Million rows at present). However in real life the 
>> parameter where I search for msg_id is not an array of 3 ids but of 
>> 300.000 or more. It is then that the query forgets the plan and goes 
>> to sequential scan. Is there any way around? Or is this the best I 
>> can have?
>>
>> Kind Regards
>> Yiannis
>

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2012-08-06 15:34:08
Subject: Re: Sequential scan instead of index scan
Previous:From: David BartonDate: 2012-08-06 15:16:29
Subject: Re: Sequential scan instead of index scan

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group