Re: Select running slow on Postgres

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "samantha mahindrakar" <sam(dot)mahindrakar(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Select running slow on Postgres
Date: 2008-07-02 19:07:45
Message-ID: dcc563d10807021207q2fd1166cj16216fd5ba8e0234@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Jul 2, 2008 at 1:01 PM, samantha mahindrakar
<sam(dot)mahindrakar(at)gmail(dot)com> wrote:
> I ran the explain analyze.Here is what i got:
>
>
> "Group (cost=112266.37..112266.40 rows=1 width=56) (actual
> time=5583.399..5615.476 rows=13373 loops=1)"
> " -> Sort (cost=112266.37..112266.38 rows=1 width=56) (actual
> time=5583.382..5590.890 rows=13373 loops=1)"
> " Sort Key: lane_data_07_08.lane_id,
> lane_data_07_08.measurement_start, lane_data_07_08.measurement_end,
> lane_data_07_08.speed, lane_data_07_08.volume, lane_data_07_08.occupancy,
> lane_data_07_08.quality, lane_data_07_08.effective_date"
> " -> Nested Loop IN Join (cost=0.00..112266.36 rows=1 width=56)
> (actual time=1100.307..5547.768 rows=13373 loops=1)"
> " -> Seq Scan on lane_data_07_08 (cost=0.00..112241.52 rows=3
> width=56) (actual time=1087.666..5341.662 rows=20581 loops=1)"

You can see here that the seq scan on lane_data is what's eating up
all your time. Also, since the row estimate is WAY off, it then chose
a nested loop thinking it would be joining up only 1 row and actually
running across 20k rows.

> " Filter: (((volume = 255::double precision) OR (speed =
> 255::double precision) OR (occupancy = 255::double precision) OR (occupancy
>>= 100::double precision) OR (volume > 52::double precision) OR (volume <
> 0::double precision) OR (speed > 120::double precision) OR (speed <
> 0::double precision)) AND (date_part('hour'::text, measurement_start) >=
> 5::double precision) AND (date_part('hour'::text, measurement_start) <=
> 23::double precision) AND (date_part('day'::text, measurement_start) =
> 1::double precision))"
> " -> Index Scan using lane_info_pk on
> lane_info (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.007 rows=1
> loops=20581)"
> " Index Cond: (lane_data_07_08.lane_id =
> lane_info.lane_id)"
> " Filter: (inactive IS NULL)"
> "Total runtime: 5621.409 ms"
>
>
> Well instaed of creating extra indexes (since they eat up lot of space) i
> made use of the whole measurement_start field, so thet it uses the index
> proeprty and makes the search faster.
> So i changed the query to include the measuerment start as follows:
>
> SELECT lane_id,measurement_start,
> measurement_end,speed,volume,occupancy,quality,effective_date
> FROM tss.lane_data_06_08
> WHERE lane_id in(select lane_id from lane_info where inactive is null )
> AND measurement_start between '2008-06-30 05:00:00-04' AND '2008-06-30
> 23:00:00-04'
> GROUP BY
> lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
> ORDER BY lane_id, measurement_start

Yeah, anytime you can just compare date / timestamp on an indexed
field you'll do better. If you find yourself needing to use the other
syntax, so you can, for instance, grab the data for 5 days in a row
from 5am to 11am or something, then the method I mentioned of making
indexes on date_part are a good choice. Note that you need regular
timestamp, not timstamptz to create indexes.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Abhijit Menon-Sen 2008-07-03 07:59:00 switchover between index and sequential scans
Previous Message samantha mahindrakar 2008-07-02 19:01:41 Re: Select running slow on Postgres