Skip site navigation (1) Skip section navigation (2)

Re: Sequential scan instead of index scan

From: Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Sequential scan instead of index scan
Date: 2012-08-07 12:15:06
Message-ID: 502106CA.6090406@anatec.com (view raw or flat)
Thread:
Lists: pgsql-performance
> Offhand I'd have thought that ANALYZE would gather stats on the
> date_trunc expression (because it is indexed) and then you should get
> something reasonably accurate for a comparison to a constant.
> "Reasonably accurate" meaning "not off by two orders of magnitude".
> Practically all of your runtime is going into this one indexscan,
> and TBH it seems likely you'd be better off with a seqscan there.
>
> 			regards, tom lane
You were right, after running ANALYZE on the temp table I eventually got 
the HASH JOIN we were talking about. Here is the plan:

"Hash Join  (cost=379575.54..1507341.18 rows=95142 width=128) (actual 
time=3128.940..634179.270 rows=10495795 loops=1)"
"  Hash Cond: (feed_all_y2012m08.ship_pos_messages.msg_id = 
tmp_tbl_messages.msg_id)"
"  ->  Append  (cost=0.00..1073525.24 rows=95142 width=128) (actual 
time=37.157..599002.314 rows=18891614 loops=1)"
"        ->  Seq Scan on ship_pos_messages  (cost=0.00..0.00 rows=1 
width=100) (actual time=0.001..0.001 rows=0 loops=1)"
"              Filter: ((date_part('day'::text, msg_date_rec) = 
2::double precision) AND (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone))"
"        ->  Seq Scan on ship_a_pos_messages ship_pos_messages 
(cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 
loops=1)"
"              Filter: ((date_part('day'::text, msg_date_rec) = 
2::double precision) AND (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone))"
"        ->  Index Scan using idx_ship_b_std_pos_messages_date_trunc on 
ship_b_std_pos_messages ship_pos_messages  (cost=0.00..48111.95 
rows=4323 width=128) (actual time=37.156..23782.030 rows=808692 loops=1)"
"              Index Cond: (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone)"
"              Filter: (date_part('day'::text, msg_date_rec) = 2::double 
precision)"
"        ->  Index Scan using idx_ship_b_ext_pos_messages_date_trunc on 
ship_b_ext_pos_messages ship_pos_messages  (cost=0.00..1844.30 rows=154 
width=128) (actual time=42.042..1270.104 rows=28656 loops=1)"
"              Index Cond: (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone)"
"              Filter: (date_part('day'::text, msg_date_rec) = 2::double 
precision)"
"        ->  Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on 
ship_a_pos_messages_wk0 ship_pos_messages  (cost=0.00..1023568.99 
rows=90663 width=128) (actual time=51.181..571590.415 rows=18054266 
loops=1)"
"              Index Cond: (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone)"
"              Filter: (date_part('day'::text, msg_date_rec) = 2::double 
precision)"
"  ->  Hash  (cost=177590.46..177590.46 rows=12311446 width=8) (actual 
time=3082.762..3082.762 rows=12311446 loops=1)"
"        Buckets: 524288  Batches: 4  Memory Usage: 120316kB"
"        ->  Seq Scan on tmp_tbl_messages (cost=0.00..177590.46 
rows=12311446 width=8) (actual time=0.022..1181.376 rows=12311446 loops=1)"
"Total runtime: 634764.596 ms"

The time looks reasonable but still quite high for the over night job I 
am need it for (have to run around 30 of those). So since the join has
been shorted I think I need to do something with the rows difference 
between actual and expected in the:

"        ->  Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on 
ship_a_pos_messages_wk0 ship_pos_messages  (cost=0.00..1023568.99 
rows=90663 width=128) (actual time=51.181..571590.415 rows=18054266 
loops=1)"
"              Index Cond: (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone)"
"              Filter: (date_part('day'::text, msg_date_rec) = 2::double 
precision)"

 From what I understand a possible solution is to increase the stats 
target for the particular column(?). Any suggestion there? I assume we 
are talking about the msg_date_rec where the index is build uppon.
Finally, I do understand what you say about the Seq scan. However in 
this case I have consistently about 10min per execution while the 
SeqScan was giving me almost nothing at best and usually it was running 
for so long that
eventually was causing my server problems...

Kind Regards
Yiannis

In response to

pgsql-performance by date

Next:From: Tomas VondraDate: 2012-08-07 12:16:43
Subject: Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
Previous:From: Stefan KellerDate: 2012-08-07 12:01:30
Subject: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group