A very long running query....

From: Ioannis Anagnostopoulos <ioannis(at)anatec(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: A very long running query....
Date: 2012-07-20 21:19:22
Message-ID: 5009CB5A.7070801@anatec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice pgsql-performance

Hello,
the following query seems to take ages to get executed. However I am
more than sure (as you can see from the explain analyse) that uses all
the correct indexes. In general I have serious issues with joins in my
database. This is a Postgres ver. 9.0 running postgis with the
"_int.sql" contrib enabled. Further more I think that the execution of
this query seriously degrades the performance of the database. I had to
device this query and run it like an overnight batch to populate a
table as I couldn't afford users to execute it over and over in a "need
to do" base. Unfortunately it is still slow and some times it either
brings down the whole database (my insertions are buffered on the app
server) or it never completes before morning.

SELECT
src_id,
date_trunc('day', message_copies.msg_date_rec) as date_count,
message_copies.pos_georef1,
message_copies.pos_georef2,
message_copies.pos_georef3,
message_copies.pos_georef4,
ais_server.array_accum(CASE WHEN msg_type BETWEEN 1 and 3
THEN message_copies.msg_id END) as msgA_array,
ais_server.array_accum(CASE WHEN msg_type = 18 THEN
message_copies.msg_id END) as msgB_std_array,
ais_server.array_accum(CASE WHEN msg_type = 19 THEN
message_copies.msg_id END) as msgB_ext_array,
uniq
(
ais_server.array_accum(CASE WHEN obj_type = 'SHIP_TYPE_A'
THEN obj_mmsi END)
) as mmsi_type_A_array,
uniq
(
ais_server.array_accum(CASE WHEN obj_type = 'SHIP_TYPE_B'
THEN obj_mmsi END)
) as mmsi_type_B_array,
avg(ship_speed) / 10.0 as avg_speed,
avg(ship_heading) as avg_heading,
avg(ship_course) / 10.0 as avg_course,
ST_Multi(ST_Collect(ship_pos_messages.pos_point)) as geom
from
feed_all_y2012m07.message_copies join
(feed_all_y2012m07.ship_pos_messages join
ais_server.ship_objects on (ship_pos_messages.obj_id =
ship_objects.obj_id))
on (message_copies.msg_id = ship_pos_messages.msg_id)
where
extract('day' from message_copies.msg_date_rec) = 17
and date_trunc('day', message_copies.msg_date_rec) = '2012-07-17'
and message_copies.src_id = 1
and (message_copies.pos_georef1 || message_copies.pos_georef2
|| message_copies.pos_georef3 || message_copies.pos_georef4) <> ''
and not (message_copies.pos_georef1 ||
message_copies.pos_georef2 || message_copies.pos_georef3 ||
message_copies.pos_georef4) is null
and extract('day' from ship_pos_messages.msg_date_rec) = 17
group by src_id, date_count, message_copies.pos_georef1,
message_copies.pos_georef2, message_copies.pos_georef3,
message_copies.pos_georef4;

What follows is the Explain Analyze:
"HashAggregate (cost=21295.20..21298.51 rows=53 width=148) (actual
time=17832235.321..17832318.546 rows=2340 loops=1)"
" -> Nested Loop (cost=0.00..21293.21 rows=53 width=148) (actual
time=62.188..17801780.764 rows=387105 loops=1)"
" -> Nested Loop (cost=0.00..20942.93 rows=53 width=144)
(actual time=62.174..17783236.718 rows=387105 loops=1)"
" Join Filter: (feed_all_y2012m07.message_copies.msg_id =
feed_all_y2012m07.ship_pos_messages.msg_id)"
" -> Append (cost=0.00..19057.93 rows=53 width=33)
(actual time=62.124..5486473.545 rows=387524 loops=1)"
" -> Seq Scan on message_copies (cost=0.00..0.00
rows=1 width=68) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: ((src_id = 1) AND
(date_trunc('day'::text, msg_date_rec) = '2012-07-17
00:00:00'::timestamp without time zone) AND (date_part('day'::text,
msg_date_rec) = 17::double precision) AND (NOT (((((pos_georef1)::text
|| (pos_georef2)::text) || (pos_georef3)::text) || (pos_georef4)::text)
IS NULL)) AND (((((pos_georef1)::text || (pos_georef2)::text) ||
(pos_georef3)::text) || (pos_georef4)::text) <> ''::text))"
" -> Index Scan using
idx_message_copies_wk2_date_src_pos_partial on message_copies_wk2
message_copies (cost=0.00..19057.93 rows=52 width=32) (actual
time=62.124..5486270.845 rows=387524 loops=1)"
" Index Cond: ((date_trunc('day'::text,
msg_date_rec) = '2012-07-17 00:00:00'::timestamp without time zone) AND
(src_id = 1))"
" Filter: ((date_part('day'::text,
msg_date_rec) = 17::double precision) AND (NOT (((((pos_georef1)::text
|| (pos_georef2)::text) || (pos_georef3)::text) || (pos_georef4)::text)
IS NULL)) AND (((((pos_georef1)::text || (pos_georef2)::text) ||
(pos_georef3)::text) || (pos_georef4)::text) <> ''::text))"
" -> Append (cost=0.00..35.50 rows=5 width=93) (actual
time=31.684..31.724 rows=1 loops=387524)"
" -> Seq Scan on ship_pos_messages (cost=0.00..0.00
rows=1 width=52) (actual time=0.001..0.001 rows=0 loops=387524)"
" Filter: (date_part('day'::text,
feed_all_y2012m07.ship_pos_messages.msg_date_rec) = 17::double precision)"
" -> Seq Scan on ship_a_pos_messages
ship_pos_messages (cost=0.00..0.00 rows=1 width=52) (actual
time=0.000..0.000 rows=0 loops=387524)"
" Filter: (date_part('day'::text,
feed_all_y2012m07.ship_pos_messages.msg_date_rec) = 17::double precision)"
" -> Index Scan using ship_b_std_pos_messages_pkey
on ship_b_std_pos_messages ship_pos_messages (cost=0.00..9.03 rows=1
width=120) (actual time=0.008..0.008 rows=0 loops=387524)"
" Index Cond:
(feed_all_y2012m07.ship_pos_messages.msg_id =
feed_all_y2012m07.message_copies.msg_id)"
" Filter: (date_part('day'::text,
feed_all_y2012m07.ship_pos_messages.msg_date_rec) = 17::double precision)"
" -> Index Scan using ship_b_ext_pos_messages_pkey
on ship_b_ext_pos_messages ship_pos_messages (cost=0.00..7.90 rows=1
width=120) (actual time=0.004..0.004 rows=0 loops=387524)"
" Index Cond:
(feed_all_y2012m07.ship_pos_messages.msg_id =
feed_all_y2012m07.message_copies.msg_id)"
" Filter: (date_part('day'::text,
feed_all_y2012m07.ship_pos_messages.msg_date_rec) = 17::double precision)"
" -> Index Scan using ship_a_pos_messages_wk2_pkey
on ship_a_pos_messages_wk2 ship_pos_messages (cost=0.00..18.57 rows=1
width=120) (actual time=31.670..31.710 rows=1 loops=387524)"
" Index Cond:
(feed_all_y2012m07.ship_pos_messages.msg_id =
feed_all_y2012m07.message_copies.msg_id)"
" Filter: (date_part('day'::text,
feed_all_y2012m07.ship_pos_messages.msg_date_rec) = 17::double precision)"
" -> Index Scan using ship_objects_pkey on ship_objects
(cost=0.00..6.59 rows=1 width=12) (actual time=0.041..0.044 rows=1
loops=387105)"
" Index Cond: (ship_objects.obj_id =
feed_all_y2012m07.ship_pos_messages.obj_id)"
"Total runtime: 17832338.082 ms"

A few more information: feed_all_y2012m07.message_copies_wk2 has 24.5
million rows only for the 17th of July and more or less the same amount
for rows per day since the 15th that I started populating it. So I guess
we are looking around 122million rows. The tables are populated with
around 16K rows per minute.

As always any help will be greatly appreciated.
Kind Regards
Yiannis

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Claudio Freire 2012-07-20 21:23:52 Re: A very long running query....
Previous Message Ioannis Anagnostopoulos 2012-07-19 12:24:34 Re: Index slow down insertions...

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2012-07-20 21:23:52 Re: A very long running query....
Previous Message Laszlo Nagy 2012-07-20 07:10:23 Re: queries are fast after dump->restore but slow again after some days dispite vacuum