Re: Help with tuning this query (Some musings)

From: "Ken Egervari" <ken(at)upfactor(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Help with tuning this query (Some musings)
Date: 2005-03-03 02:52:39
Message-ID: 002a01c51f9c$1066d0e0$6a01a8c0@KEN
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers-win32 pgsql-performance

I took John's advice and tried to work with sub-selects. I tried this
variation, which actually seems like it would make a difference conceptually
since it drives on the person table quickly. But to my surprise, the query
runs at about 375 milliseconds. I think it's because it's going over that
shipment table multiple times, which is where the results are coming from.

select s.*, ss.*

from shipment s
inner join shipment_status ss on s.current_status_id=ss.id
inner join release_code rc on ss.release_code_id=rc.id
left outer join driver d on s.driver_id=d.id
left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.id in (
select s.id
from person p
inner join carrier_to_person ctp on p.id=ctp.person_id
inner join carrier c on ctp.carrier_id=c.id
inner join carrier_code cc on cc.carrier_id = c.id
inner join shipment s on s.carrier_code_id = cc.id
where p.id = 355
and s.current_status_id is not null
and s.is_purged=false
)
and(rc.number='9' )
and(ss.date>=current_date-31 )

order by ss.date desc

*** Musing 1
Also, "s.current_status_id is not null" is an important filter that I forgot
to mention. In this example where p.id = 355, it filters out 90% of the
rows. In general, that filter ratio is 0.46 though, which is not quite so
high. However, this filter gets better over time because more and more
users will use a filter that will make this value null. It's still not as
strong as person though and probably never will be. But I thought I'd
mention it nonetheless.

*** Musing 2
I do think that the filter "ss.date>=current_date-31" is slowing this query
down. I don't think it's the mention of "current_date" or even that it's
dynamic instead of static. I think the range is just too big. For example,
if I use:

and ss.date between '2005-02-01 00:00:00' and '2005-02-28 23:59:59'

The query still results in 250 milliseconds. But if I make the range very
small - say Feb 22nd of 2005:

and ss.date between '2005-02-22 00:00:00' and '2005-02-22 23:59:59'

Now the entire query runs in 47 milliseconds on average. If I can't make
this query perform any better, should I change the user interface to select
the date instead of showing the last 31 days to benefit from this single-day
filter? This causes more clicks to select the day (like from a calendar),
but most users probably aren't interested in seeing the entire listing
anyway. However, it's a very important requirement that users know that
shipment enteries exist in the last 31 days (because they are usually
sure-fire problems if they are still in this query after a few days).

I guess I'm wondering if tuning the query is futile and I should get the
requirements changed, or is there something I can do to really speed it up?

Thanks again,
Ken

In response to

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Mark Kirkwood 2005-03-03 04:30:16 Re: Help with tuning this query
Previous Message Ken Egervari 2005-03-03 02:51:55 Re: Help with tuning this query (more musings)

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2005-03-03 04:30:16 Re: Help with tuning this query
Previous Message Ken Egervari 2005-03-03 02:51:55 Re: Help with tuning this query (more musings)