Re: Query optimizing

From: "Sebastian Ritter" <ritter(dot)sebastian(at)gmail(dot)com>
To: "Helio Campos Mello de Andrade" <helio(dot)campos(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Query optimizing
Date: 2008-11-10 11:58:02
Message-ID: 99b656cb0811100358v71788c28s51d08d5512200b21@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi Helio,

Sorry about the parenthesis - Bad copy/pasting skills! To further discuss
your suggestion: Wouldn't adding n_issue=i.id as a where clause filter cause
the sub-query to become correlated and thus much less efficient ? I may be
wrong, or may have miss-understood your suggestion.

Thanks for you help,
Sebastian

On Mon, Nov 10, 2008 at 11:48 AM, Helio Campos Mello de Andrade <
helio(dot)campos(at)gmail(dot)com> wrote:

> Hi Sebastian,
>
> - First of all i think there is an open-parenthesis missing in the query
> V2.
> Maybe in the V2 version you cold restrict the results in the INNER query a
> bit more if you use a restriction clause like "WHERE n_issue = i.id" in
> that. It will certainly lower the number of rows returned by it to only 1
> result.
>
> Regards
>
> --
> Helio Campos Mello de Andrade
>
>
>
> On Mon, Nov 10, 2008 at 8:43 AM, Sebastian Ritter <
> ritter(dot)sebastian(at)gmail(dot)com> wrote:
>
>> Hi all,
>>
>> I was hoping to receive some advise on a slow running query in our
>> business' Issue Tracking System. To shed some light on the below mentioned
>> queries, here is a brief summary of how users interact with the system. The
>> two main components in the system are a Issues and Followups. An Issue is
>> created by our technical support staff when some form of action needs to be
>> taken in order to resolve a problem. FollowUps are entries entered against
>> an issue depicting the sequence of events taken to resolve the issue. There
>> are about 15,000 Issues in the system at present and about 95,000 FollowUps
>> in the system. As we need the system to be very responsive, each query
>> should ideally run in under 1 second.
>>
>> A lot of the reports our technical officers submit to us include a listing
>> of all actioned issues for a given day along with the last modified followup
>> of each said issue. With the number of rows in our database increasing at a
>> high rate, these queries are starting to run too slowly.
>>
>> Here is a condensed version of the two tables:
>>
>> Issues:
>> =================================
>> id - integer
>> dt_created - timestamp
>> dt_modified - timestamp
>> t_title - varchar
>> t_description - varchar
>>
>> FollowUps:
>> =================================
>> id - integer
>> dt_created - timestamp
>> dt_modified - timestamp
>> t_description - varchar
>> n_issue - foregin key to issues
>>
>> We have identified that the slowness in our queries is trying to return
>> the lastest followup for each actioned issue that day. Without further ado
>> here are two variations I have tried within the system (neither of which are
>> making the cut):
>>
>> V1 (correlated subquery - Very bad performance)
>>
>> (SELECT
>> fu.*
>> FROM
>> manage_followup fu,
>> manage_issue i
>> WHERE
>> i.id = fu.n_issue
>> AND
>> fu.id = (SELECT
>> id
>> FROM
>> manage_followup
>> WHERE
>> n_issue = i.id
>> ORDER BY
>> dt_modified DESC
>> LIMIT 1)) AS latestfu,
>>
>> V2 (Using Group By, "max" aggregate function and distinct- better
>> performance, but still bad because of distinct)
>>
>>
>> SELECT DISTINCT ON (fu.n_issue)
>> fu.id,
>> fu.dt_created,
>> fu.dt_modified,
>> fu.t_description,
>> fu.n_issue as issue_id
>> FROM
>> manage_followup fu,
>> (SELECT
>> n_issue,
>> max(dt_modified) as dt_modified
>> FROM
>> manage_followup
>> GROUP BY
>> n_issue) as max_modified
>> WHERE
>> max_modified.n_issue = fu.n_issue
>> AND
>> fu.dt_modified = max_modified.dt_modified)
>> AS latestfu ON (latestfu.issue_id = i.id),
>>
>> We must use distinct here as we sometimes use batch scripts to enter
>> followups, which will give them all similar, if not equal, modification
>> dates. We also can't use followup ids as an indicator of the latest followup
>> because users of the system can retrospectively go back and change older
>> followups.
>>
>> I was hoping some one could provide a solution that does not require a
>> corrolated subquery or make use of the distinct keyword. Any help would be
>> much appreciated.
>>
>> Kind regards,
>> Sebastian
>>
>>
>>
>>
>>
>>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Richard Huxton 2008-11-10 12:03:22 Re: Query optimizing
Previous Message Helio Campos Mello de Andrade 2008-11-10 11:48:49 Re: Query optimizing