Re: Query tuning help

From: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
To: Dan Harris <fbsd(at)drivefaster(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query tuning help
Date: 2005-05-09 00:51:14
Message-ID: 200505091051.14689.mr-russ@pws.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
> Sorry to bother everyone with yet another "my query isn't using an
> index" problem but I am over my head on this one.. I am open to ways
> of restructuring this query to perform better.
>
> I have a table, 'ea', with 22 million rows in it. VACUUM ANALYZE has
> been just run on the table.
>
> This is the result of:
>
> explain analyze
> select distinct
> em.incidentid,
> ea.recordtext as retdata,
> eg.long,
> eg.lat
> from
> ea, em, eg
> where
> em.incidentid = ea.incidentid and
> em.incidentid = eg.incidentid and
> em.entrydate >= '2005-1-1 00:00' and
> em.entrydate <= '2005-5-9 00:00'
> and ea.incidentid in (
> select
> incidentid
> from
> ea
> where
> recordtext like '%RED%'
> )
>
> and ea.incidentid in (
> select
> incidentid
> from
> ea
> where
> recordtext like '%CORVETTE%'
> )
> and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' )
> order by em.entrydate
>
You cannot use an index for %CORVETTE%, or %RED%. There is no way
for the index to know if a row had that in the middle without scanning the whole
index. So it's much cheaper to do a sequence scan.

One possible way to make the query faster is to limit based on date, as you will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans. I think this query does what you do
above, and I think it will be faster, but I don't know.

select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >= '2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or recordtext like '%CORVETTE%' );

>
> ---------------------
> ANALYZE RESULTS
> ---------------------
>
> Unique (cost=774693.72..774693.76 rows=1 width=159) (actual time=446787.056..446787.342 rows=72 loops=1)
> -> Sort (cost=774693.72..774693.72 rows=1 width=159) (actual time=446787.053..446787.075 rows=72 loops=1)
> Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
> -> Nested Loop (cost=771835.10..774693.71 rows=1 width=159) (actual time=444378.655..446786.746 rows=72 loops=1)
> -> Nested Loop (cost=771835.10..774688.81 rows=1 width=148) (actual time=444378.532..446768.381 rows=72 loops=1)
> -> Nested Loop IN Join (cost=771835.10..774678.88 rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1)
> -> Nested Loop (cost=771835.10..774572.05 rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1)
> -> HashAggregate (cost=771835.10..771835.10 rows=1 width=17) (actual time=444366.702..444368.583 rows=473 loops=1)
> -> Seq Scan on ea (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 rows=592 loops=1)
> Filter: ((recordtext)::text ~~ '%CORVETTE%'::text)
> -> Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473)
> Index Cond: ((ea.incidentid)::text = ("outer".incidentid)::text)
> Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
> -> Index Scan using ea1 on ea (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 loops=1011)
> Index Cond: (("outer".incidentid)::text = (ea.incidentid)::text)
> Filter: ((recordtext)::text ~~ '%RED%'::text)
> -> Index Scan using em_incidentid_idx on em (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701)
> Index Cond: (("outer".incidentid)::text = (em.incidentid)::text)
> Filter: ((entrydate >= '2005-01-01 00:00:00'::timestamp without time zone) AND (entrydate <= '2005-05-09 00:00:00'::timestamp without time zone))
> -> Index Scan using eg_incidentid_idx on eg (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72)
> Index Cond: (("outer".incidentid)::text = (eg.incidentid)::text)
> Total runtime: 446871.880 ms
> (22 rows)
>
>
> -------------------------
> EXPLANATION
> -------------------------
> The reason for the redundant LIKE clause is that first, I only want
> those "incidentid"s that contain the words 'RED' and 'CORVETTE'. BUT,
> those two words may exist across multiple records with the same
> incidentid. Then, I only want to actually work with the rows that
> contain one of the words. This query will repeat the same logic for
> however many keywords are entered by the user. I have investigated
> text searching options and have not found them to be congruous with my
> application.
>
> Why is it choosing a sequential scan one part of the query when
> searching for the words, yet using an index scan for another part of
> it? Is there a better way to structure the query to give it better
> hints?
>
> I'm using 8.0.1 on a 4-way Opteron with beefy RAID-10 and 12GB of RAM.
>
> Thank you for any advice.
>
> -Dan
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-05-09 00:58:22 Re: Query tuning help
Previous Message Josh Berkus 2005-05-09 00:48:18 Re: Query tuning help