Skip site navigation (1) Skip section navigation (2)

Re: Query optimizing

From: "Sebastian Ritter" <ritter(dot)sebastian(at)gmail(dot)com>
To: "Helio Campos Mello de Andrade" <helio(dot)campos(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Query optimizing
Date: 2008-11-10 11:58:02
Message-ID: 99b656cb0811100358v71788c28s51d08d5512200b21@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-sql
Hi Helio,

Sorry about the parenthesis - Bad copy/pasting skills! To further discuss
your suggestion: Wouldn't adding n_issue=i.id as a where clause filter cause
the sub-query to become correlated and thus much less efficient ? I may be
wrong, or may have miss-understood your suggestion.

Thanks for you help,
Sebastian


On Mon, Nov 10, 2008 at 11:48 AM, Helio Campos Mello de Andrade <
helio(dot)campos(at)gmail(dot)com> wrote:

> Hi Sebastian,
>
>  - First of all i think there is an open-parenthesis missing in the query
> V2.
> Maybe in the V2 version you cold restrict the results in the INNER query a
> bit more if you use a restriction clause like "WHERE n_issue = i.id" in
> that. It will certainly lower the number of rows returned by it to only 1
> result.
>
> Regards
>
> --
> Helio Campos Mello de Andrade
>
>
>
> On Mon, Nov 10, 2008 at 8:43 AM, Sebastian Ritter <
> ritter(dot)sebastian(at)gmail(dot)com> wrote:
>
>> Hi all,
>>
>> I was hoping to receive some advise on a slow running query in our
>> business' Issue Tracking System. To shed some light on the below mentioned
>> queries, here is a brief summary of how users interact with the system. The
>> two main components in the system are a Issues and Followups. An Issue is
>> created by our technical support staff when some form of action needs to be
>> taken in order to resolve a problem. FollowUps are entries entered against
>> an issue depicting the sequence of events taken to resolve the issue. There
>> are about 15,000 Issues in the system at present and about 95,000 FollowUps
>> in the system. As we need the system to be very responsive, each query
>> should ideally run in under 1 second.
>>
>> A lot of the reports our technical officers submit to us include a listing
>> of all actioned issues for a given day along with the last modified followup
>> of each said issue. With the number of rows in our database increasing at a
>> high rate, these queries are starting to run too slowly.
>>
>> Here is a condensed version of the two tables:
>>
>> Issues:
>> =================================
>> id  - integer
>> dt_created - timestamp
>> dt_modified - timestamp
>> t_title - varchar
>> t_description - varchar
>>
>> FollowUps:
>> =================================
>> id  - integer
>> dt_created - timestamp
>> dt_modified - timestamp
>> t_description - varchar
>> n_issue - foregin key to issues
>>
>> We have identified that the slowness in our queries is trying to return
>> the lastest followup for each actioned issue that day. Without further ado
>> here are two variations I have tried within the system (neither of which are
>> making the cut):
>>
>> V1 (correlated subquery - Very bad performance)
>>
>>  (SELECT
>>          fu.*
>>   FROM
>>         manage_followup fu,
>>         manage_issue i
>>   WHERE
>>          i.id = fu.n_issue
>>          AND
>>          fu.id = (SELECT
>>                         id
>>                    FROM
>>                         manage_followup
>>                     WHERE
>>                          n_issue = i.id
>>                     ORDER BY
>>                          dt_modified DESC
>>                     LIMIT 1)) AS latestfu,
>>
>> V2 (Using Group By, "max" aggregate function  and distinct- better
>> performance, but still bad because of distinct)
>>
>>
>> SELECT DISTINCT ON (fu.n_issue)
>>                                 fu.id,
>>                                 fu.dt_created,
>>                                 fu.dt_modified,
>>                                 fu.t_description,
>>                                 fu.n_issue as issue_id
>>                             FROM
>>                                 manage_followup fu,
>>                                 (SELECT
>>                                     n_issue,
>>                                     max(dt_modified) as dt_modified
>>                                  FROM
>>                                     manage_followup
>>                                  GROUP BY
>>                                     n_issue) as max_modified
>>                             WHERE
>>                                 max_modified.n_issue = fu.n_issue
>>                                 AND
>>                                 fu.dt_modified = max_modified.dt_modified)
>> AS latestfu ON (latestfu.issue_id = i.id),
>>
>> We must use distinct here as we sometimes use batch scripts to enter
>> followups, which will give them all similar, if not equal, modification
>> dates. We also can't use followup ids as an indicator of the latest followup
>> because users of the system can retrospectively go back and change older
>> followups.
>>
>> I was hoping some one could provide a solution that does not require a
>> corrolated subquery or make use of the distinct keyword. Any help would be
>> much appreciated.
>>
>> Kind regards,
>> Sebastian
>>
>>
>>
>>
>>
>>

In response to

pgsql-sql by date

Next:From: Richard HuxtonDate: 2008-11-10 12:03:22
Subject: Re: Query optimizing
Previous:From: Helio Campos Mello de AndradeDate: 2008-11-10 11:48:49
Subject: Re: Query optimizing

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group