Re: Sequential scan instead of index scan

From: Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential scan instead of index scan
Date: 2012-08-06 15:24:57
Message-ID: 501FE1C9.6010209@anatec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

They are random as the data are coming from multiple threads that are
inserting in the database. I see what you say about "linking them", and
I may give it a try with the date. The other think that "links" them
together is the 4 georef fields, however at that stage I am trying to
collect statistics on the georefs population of "msg_id" so I don't know
before hand the values to limit my query on them... Do you think an
index on "date, msg_id" might do something?

Yiannis

On 06/08/2012 16:16, David Barton wrote:
> Hi Yiannis,
>
> Is there anything linking these ids together, or are the relatively
> random? If they are relatively random, the rows are likely to be
> sprinkled amongst many blocks and so a seq scan is the fastest. I've
> seen similar problems with indexed queries in a multi-tennant database
> where the data is so fragmented that once the record volume hits a
> certain threshold, Postgres decides to table scan rather than use an
> index.
>
> The query optimiser is unlikely to be able to determine the disk
> locality of 300k rows and so it just takes a punt on a seq scan.
>
> If you added another filter condition on something indexed e.g. last
> week or last month or location or something, you might do better if
> the data does exhibit disk locality. If the data really is scattered,
> then a seq scan really will be quicker.
>
> Regards, David
>
> On 06/08/12 23:08, Ioannis Anagnostopoulos wrote:
>> Hi, my query is very simple:
>>
>> select
>> msg_id,
>> msg_type,
>> ship_pos_messages.pos_georef1,
>> ship_pos_messages.pos_georef2,
>> ship_pos_messages.pos_georef3,
>> ship_pos_messages.pos_georef4,
>> obj_id,
>> ship_speed,
>> ship_heading,
>> ship_course,
>> pos_point
>> from
>> feed_all_y2012m08.ship_pos_messages
>> where
>> extract('day' from msg_date_rec) = 1
>> AND msg_id = any(ARRAY[7294724,14174174,22254408]);
>>
>> The msg_id is the pkey on the ship_pos_messages table and in this
>> example it is working fast as it uses the pkey (primary key index) to
>> make the selection. The expplain anayze follows:
>> "Result (cost=0.00..86.16 rows=5 width=117) (actual
>> time=128.734..163.319 rows=3 loops=1)"
>> " -> Append (cost=0.00..86.16 rows=5 width=117) (actual
>> time=128.732..163.315 rows=3 loops=1)"
>> " -> Seq Scan on ship_pos_messages (cost=0.00..0.00 rows=1
>> width=100) (actual time=0.001..0.001 rows=0 loops=1)"
>> " Filter: ((msg_id = ANY
>> ('{7294724,14174174,22254408}'::integer[])) AND
>> (date_part('day'::text, msg_date_rec) = 1::double precision))"
>> " -> Seq Scan on ship_a_pos_messages ship_pos_messages
>> (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0
>> loops=1)"
>> " Filter: ((msg_id = ANY
>> ('{7294724,14174174,22254408}'::integer[])) AND
>> (date_part('day'::text, msg_date_rec) = 1::double precision))"
>> " -> Bitmap Heap Scan on ship_b_std_pos_messages
>> ship_pos_messages (cost=13.41..25.42 rows=1 width=128) (actual
>> time=49.127..49.127 rows=0 loops=1)"
>> " Recheck Cond: (msg_id = ANY
>> ('{7294724,14174174,22254408}'::integer[]))"
>> " Filter: (date_part('day'::text, msg_date_rec) =
>> 1::double precision)"
>> " -> Bitmap Index Scan on ship_b_std_pos_messages_pkey
>> (cost=0.00..13.41 rows=3 width=0) (actual time=49.125..49.125 rows=0
>> loops=1)"
>> " Index Cond: (msg_id = ANY
>> ('{7294724,14174174,22254408}'::integer[]))"
>> " -> Bitmap Heap Scan on ship_b_ext_pos_messages
>> ship_pos_messages (cost=12.80..24.62 rows=1 width=128) (actual
>> time=0.029..0.029 rows=0 loops=1)"
>> " Recheck Cond: (msg_id = ANY
>> ('{7294724,14174174,22254408}'::integer[]))"
>> " Filter: (date_part('day'::text, msg_date_rec) =
>> 1::double precision)"
>> " -> Bitmap Index Scan on ship_b_ext_pos_messages_pkey
>> (cost=0.00..12.80 rows=3 width=0) (actual time=0.027..0.027 rows=0
>> loops=1)"
>> " Index Cond: (msg_id = ANY
>> ('{7294724,14174174,22254408}'::integer[]))"
>> " -> Bitmap Heap Scan on ship_a_pos_messages_wk0
>> ship_pos_messages (cost=24.08..36.12 rows=1 width=128) (actual
>> time=79.572..114.152 rows=3 loops=1)"
>> " Recheck Cond: (msg_id = ANY
>> ('{7294724,14174174,22254408}'::integer[]))"
>> " Filter: (date_part('day'::text, msg_date_rec) =
>> 1::double precision)"
>> " -> Bitmap Index Scan on ship_a_pos_messages_wk0_pkey
>> (cost=0.00..24.08 rows=3 width=0) (actual time=67.441..67.441 rows=3
>> loops=1)"
>> " Index Cond: (msg_id = ANY
>> ('{7294724,14174174,22254408}'::integer[]))"
>> "Total runtime: 180.146 ms"
>>
>> I think this is a pretty good plan and quite quick given the size of
>> the table (88Million rows at present). However in real life the
>> parameter where I search for msg_id is not an array of 3 ids but of
>> 300.000 or more. It is then that the query forgets the plan and goes
>> to sequential scan. Is there any way around? Or is this the best I
>> can have?
>>
>> Kind Regards
>> Yiannis
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-08-06 15:34:08 Re: Sequential scan instead of index scan
Previous Message David Barton 2012-08-06 15:16:29 Re: Sequential scan instead of index scan