Query tuning help

From: Dan Harris <fbsd(at)drivefaster(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Query tuning help
Date: 2005-05-08 23:20:35
Message-ID: 924ac6c029f5453c442eacdda06b132c@drivefaster.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Sorry to bother everyone with yet another "my query isn't using an
index" problem but I am over my head on this one.. I am open to ways
of restructuring this query to perform better.

I have a table, 'ea', with 22 million rows in it. VACUUM ANALYZE has
been just run on the table.

This is the result of:

explain analyze
select distinct
em.incidentid,
ea.recordtext as retdata,
eg.long,
eg.lat
from
ea, em, eg
where
em.incidentid = ea.incidentid and
em.incidentid = eg.incidentid and
em.entrydate >= '2005-1-1 00:00' and
em.entrydate <= '2005-5-9 00:00'
and ea.incidentid in (
select
incidentid
from
ea
where
recordtext like '%RED%'
)

and ea.incidentid in (
select
incidentid
from
ea
where
recordtext like '%CORVETTE%'
)
and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' )
order by em.entrydate

---------------------
ANALYZE RESULTS
---------------------

Unique (cost=774693.72..774693.76 rows=1 width=159) (actual
time=446787.056..446787.342 rows=72 loops=1)
-> Sort (cost=774693.72..774693.72 rows=1 width=159) (actual
time=446787.053..446787.075 rows=72 loops=1)
Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
-> Nested Loop (cost=771835.10..774693.71 rows=1 width=159)
(actual time=444378.655..446786.746 rows=72 loops=1)
-> Nested Loop (cost=771835.10..774688.81 rows=1
width=148) (actual time=444378.532..446768.381 rows=72 loops=1)
-> Nested Loop IN Join
(cost=771835.10..774678.88 rows=2 width=81) (actual
time=444367.080..446191.864 rows=701 loops=1)
-> Nested Loop (cost=771835.10..774572.05
rows=42 width=64) (actual time=444366.859..445463.232 rows=1011
loops=1)
-> HashAggregate
(cost=771835.10..771835.10 rows=1 width=17) (actual
time=444366.702..444368.583 rows=473 loops=1)
-> Seq Scan on ea
(cost=0.00..771834.26 rows=335 width=17) (actual
time=259.746..444358.837 rows=592 loops=1)
Filter:
((recordtext)::text ~~ '%CORVETTE%'::text)
-> Index Scan using ea1 on ea
(cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2
loops=473)
Index Cond:
((ea.incidentid)::text = ("outer".incidentid)::text)
Filter: (((recordtext)::text ~~
'%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
-> Index Scan using ea1 on ea
(cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1
loops=1011)
Index Cond:
(("outer".incidentid)::text = (ea.incidentid)::text)
Filter: ((recordtext)::text ~~
'%RED%'::text)
-> Index Scan using em_incidentid_idx on em
(cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0
loops=701)
Index Cond: (("outer".incidentid)::text =
(em.incidentid)::text)
Filter: ((entrydate >= '2005-01-01
00:00:00'::timestamp without time zone) AND (entrydate <= '2005-05-09
00:00:00'::timestamp without time zone))
-> Index Scan using eg_incidentid_idx on eg
(cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1
loops=72)
Index Cond: (("outer".incidentid)::text =
(eg.incidentid)::text)
Total runtime: 446871.880 ms
(22 rows)

-------------------------
EXPLANATION
-------------------------
The reason for the redundant LIKE clause is that first, I only want
those "incidentid"s that contain the words 'RED' and 'CORVETTE'. BUT,
those two words may exist across multiple records with the same
incidentid. Then, I only want to actually work with the rows that
contain one of the words. This query will repeat the same logic for
however many keywords are entered by the user. I have investigated
text searching options and have not found them to be congruous with my
application.

Why is it choosing a sequential scan one part of the query when
searching for the words, yet using an index scan for another part of
it? Is there a better way to structure the query to give it better
hints?

I'm using 8.0.1 on a 4-way Opteron with beefy RAID-10 and 12GB of RAM.

Thank you for any advice.

-Dan

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2005-05-09 00:48:18 Re: Query tuning help
Previous Message Mischa Sandberg 2005-05-08 22:33:11 Re: Whence the Opterons?