Re: Help with tuning this query (more musings)

From: John A Meinel <john(at)arbash-meinel(dot)com>
To: Ken Egervari <ken(at)upfactor(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help with tuning this query (more musings)
Date: 2005-03-03 05:04:59
Message-ID: 42269AFB.9010700@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers-win32 pgsql-performance

Ken Egervari wrote:

>> I took John's advice and tried to work with sub-selects. I tried
>> this variation, which actually seems like it would make a difference
>> conceptually since it drives on the person table quickly. But to my
>> surprise, the query runs at about 375 milliseconds. I think it's
>> because it's going over that shipment table multiple times, which is
>> where the results are coming from.
>
>
> I also made a version that runs over shipment a single time, but it's
> exactly 250 milliseconds. I guess the planner does the exact same thing.
>
Why are you now left joining driver and carrier code, but inner joining
shipment_status? I assume this is the *real* query that you are executing.

From the earlier explain analyze, and your statements, the initial
person p should be the heavily selective portion.

And what does "driver" get you? It isn't in the return, and it isn't
part of a selectivity clause.
You are also double joining against carrier code, once as a left outer
join, and once in the inner join.

This query doesn't seem quite right. Are you sure it is generating the
rows you are expecting?

> select s.*, ss.*
>
> from shipment s
> inner join shipment_status ss on s.current_status_id=ss.id
> inner join release_code rc on ss.release_code_id=rc.id
> left outer join driver d on s.driver_id=d.id
> left outer join carrier_code cc on s.carrier_code_id=cc.id
> where s.carrier_code_id in (
> select cc.id
> from person p
> inner join carrier_to_person ctp on p.id=ctp.person_id
> inner join carrier c on ctp.carrier_id=c.id
> inner join carrier_code cc on cc.carrier_id = c.id
> where p.id = 355
> )
> and s.current_status_id is not null
> and s.is_purged=false
> and(rc.number='9' )
> and(ss.date>=current_date-31 )
>
> order by ss.date desc

You might want to post the explain analyze of this query to have a point
of reference, but what about something like this:
select s.*, ss.*

from shipment_status ss on s.current_status_id=ss.id
join (select s.* from shipment s
where s.carrier_code_id in
(select cc.id
from person p
inner join carrier_to_person ctp on p.id=ctp.person_id
inner join carrier c on ctp.carrier_id=c.id
inner join carrier_code cc on cc.carrier_id = c.id
where p.id = 355
)
and s.current_status_id is not null
and s.is_purged=false
) as i -- Just a name for the subselect since it is in a join
inner join release_code rc on ss.release_code_id=rc.id
where (rc.number='9' )
and(ss.date between current_date-31 and current_date())

order by ss.date desc

My idea with this query is to minimize the number of shipment rows that
need to be generated before joining with the other rows. My syntax is
probably a little bit off, since I can't actually run it against real
tables.
But looking at your *original* query, you were getting 15000 rows out of
shipment_status, and then 27700 rows out of shipment, which was then
being merge-joined down to only 300 rows, and then hash-joined down to 39.

I'm just trying to think of ways to prevent it from blossoming into 27k
rows to start with.

Please double check your query, because it seems to be grabbing
unnecessary rows with the left joins, and then post another explain
analyze with one (or several) different subselect forms.

John
=:->

In response to

Responses

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Josh Berkus 2005-03-03 05:36:23 Re: Help with tuning this query
Previous Message Mark Kirkwood 2005-03-03 04:30:16 Re: Help with tuning this query

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2005-03-03 05:31:01 Re: Performance tradeoff
Previous Message Mark Kirkwood 2005-03-03 04:30:16 Re: Help with tuning this query