Query optimizing

From: "Sebastian Ritter" <ritter(dot)sebastian(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Query optimizing
Date: 2008-11-10 10:43:14
Message-ID: 99b656cb0811100243v499884f9ra457233c5a168521@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi all,

I was hoping to receive some advise on a slow running query in our business'
Issue Tracking System. To shed some light on the below mentioned queries,
here is a brief summary of how users interact with the system. The two main
components in the system are a Issues and Followups. An Issue is created by
our technical support staff when some form of action needs to be taken in
order to resolve a problem. FollowUps are entries entered against an issue
depicting the sequence of events taken to resolve the issue. There are about
15,000 Issues in the system at present and about 95,000 FollowUps in the
system. As we need the system to be very responsive, each query should
ideally run in under 1 second.

A lot of the reports our technical officers submit to us include a listing
of all actioned issues for a given day along with the last modified followup
of each said issue. With the number of rows in our database increasing at a
high rate, these queries are starting to run too slowly.

Here is a condensed version of the two tables:

Issues:
=================================
id - integer
dt_created - timestamp
dt_modified - timestamp
t_title - varchar
t_description - varchar

FollowUps:
=================================
id - integer
dt_created - timestamp
dt_modified - timestamp
t_description - varchar
n_issue - foregin key to issues

We have identified that the slowness in our queries is trying to return the
lastest followup for each actioned issue that day. Without further ado here
are two variations I have tried within the system (neither of which are
making the cut):

V1 (correlated subquery - Very bad performance)

(SELECT
fu.*
FROM
manage_followup fu,
manage_issue i
WHERE
i.id = fu.n_issue
AND
fu.id = (SELECT
id
FROM
manage_followup
WHERE
n_issue = i.id
ORDER BY
dt_modified DESC
LIMIT 1)) AS latestfu,

V2 (Using Group By, "max" aggregate function and distinct- better
performance, but still bad because of distinct)

SELECT DISTINCT ON (fu.n_issue)
fu.id,
fu.dt_created,
fu.dt_modified,
fu.t_description,
fu.n_issue as issue_id
FROM
manage_followup fu,
(SELECT
n_issue,
max(dt_modified) as dt_modified
FROM
manage_followup
GROUP BY
n_issue) as max_modified
WHERE
max_modified.n_issue = fu.n_issue
AND
fu.dt_modified = max_modified.dt_modified)
AS latestfu ON (latestfu.issue_id = i.id),

We must use distinct here as we sometimes use batch scripts to enter
followups, which will give them all similar, if not equal, modification
dates. We also can't use followup ids as an indicator of the latest followup
because users of the system can retrospectively go back and change older
followups.

I was hoping some one could provide a solution that does not require a
corrolated subquery or make use of the distinct keyword. Any help would be
much appreciated.

Kind regards,
Sebastian

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Richard Huxton 2008-11-10 11:05:27 Re: Query optimizing
Previous Message Devil™ Dhuvader 2008-11-08 17:12:38 Re: need help in building a query