Re: Sequential scan instead of index scan

From: Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Sequential scan instead of index scan
Date: 2012-08-07 12:15:06
Message-ID: 502106CA.6090406@anatec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> Offhand I'd have thought that ANALYZE would gather stats on the
> date_trunc expression (because it is indexed) and then you should get
> something reasonably accurate for a comparison to a constant.
> "Reasonably accurate" meaning "not off by two orders of magnitude".
> Practically all of your runtime is going into this one indexscan,
> and TBH it seems likely you'd be better off with a seqscan there.
>
> regards, tom lane
You were right, after running ANALYZE on the temp table I eventually got
the HASH JOIN we were talking about. Here is the plan:

"Hash Join (cost=379575.54..1507341.18 rows=95142 width=128) (actual
time=3128.940..634179.270 rows=10495795 loops=1)"
" Hash Cond: (feed_all_y2012m08.ship_pos_messages.msg_id =
tmp_tbl_messages.msg_id)"
" -> Append (cost=0.00..1073525.24 rows=95142 width=128) (actual
time=37.157..599002.314 rows=18891614 loops=1)"
" -> Seq Scan on ship_pos_messages (cost=0.00..0.00 rows=1
width=100) (actual time=0.001..0.001 rows=0 loops=1)"
" Filter: ((date_part('day'::text, msg_date_rec) =
2::double precision) AND (date_trunc('day'::text, msg_date_rec) =
'2012-08-02 00:00:00'::timestamp without time zone))"
" -> Seq Scan on ship_a_pos_messages ship_pos_messages
(cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0
loops=1)"
" Filter: ((date_part('day'::text, msg_date_rec) =
2::double precision) AND (date_trunc('day'::text, msg_date_rec) =
'2012-08-02 00:00:00'::timestamp without time zone))"
" -> Index Scan using idx_ship_b_std_pos_messages_date_trunc on
ship_b_std_pos_messages ship_pos_messages (cost=0.00..48111.95
rows=4323 width=128) (actual time=37.156..23782.030 rows=808692 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec) =
'2012-08-02 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) = 2::double
precision)"
" -> Index Scan using idx_ship_b_ext_pos_messages_date_trunc on
ship_b_ext_pos_messages ship_pos_messages (cost=0.00..1844.30 rows=154
width=128) (actual time=42.042..1270.104 rows=28656 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec) =
'2012-08-02 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) = 2::double
precision)"
" -> Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on
ship_a_pos_messages_wk0 ship_pos_messages (cost=0.00..1023568.99
rows=90663 width=128) (actual time=51.181..571590.415 rows=18054266
loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec) =
'2012-08-02 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) = 2::double
precision)"
" -> Hash (cost=177590.46..177590.46 rows=12311446 width=8) (actual
time=3082.762..3082.762 rows=12311446 loops=1)"
" Buckets: 524288 Batches: 4 Memory Usage: 120316kB"
" -> Seq Scan on tmp_tbl_messages (cost=0.00..177590.46
rows=12311446 width=8) (actual time=0.022..1181.376 rows=12311446 loops=1)"
"Total runtime: 634764.596 ms"

The time looks reasonable but still quite high for the over night job I
am need it for (have to run around 30 of those). So since the join has
been shorted I think I need to do something with the rows difference
between actual and expected in the:

" -> Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on
ship_a_pos_messages_wk0 ship_pos_messages (cost=0.00..1023568.99
rows=90663 width=128) (actual time=51.181..571590.415 rows=18054266
loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec) =
'2012-08-02 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) = 2::double
precision)"

From what I understand a possible solution is to increase the stats
target for the particular column(?). Any suggestion there? I assume we
are talking about the msg_date_rec where the index is build uppon.
Finally, I do understand what you say about the Seq scan. However in
this case I have consistently about 10min per execution while the
SeqScan was giving me almost nothing at best and usually it was running
for so long that
eventually was causing my server problems...

Kind Regards
Yiannis

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2012-08-07 12:16:43 Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m
Previous Message Stefan Keller 2012-08-07 12:01:30 Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m