FW: query optimization question

From: <terry(at)ashtonwoodshomes(dot)com>
To: "Postgresql Sql Group (E-mail)" <pgsql-sql(at)postgresql(dot)org>
Subject: FW: query optimization question
Date: 2002-11-06 14:01:49
Message-ID: 002601c2859d$0d825480$2766f30a@development.greatgulfhomes.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

No offence taken, however it is incorrect, my SQL is pretty good. I
received no other responses... And I later realized the solution to my
question:

(EXPERTS READ ON: If anyone can show me how to use a group by or otherwise
optimize I would be grateful)

This subquery:
SELECT project_id, marketing_name,
(SELECT count(lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
#PreserveSingleQuotes(variables.base_query)#
) AS def_count,
Actually does return a deficiency count, where there could be more then 1
deficiency per lot. In order to get my lot_count, (number of lots with 1 or
more deficiencies) I just needed to add a DISTINCT clause in my count()
aggregate, ie SELECT count(DISTINCT lots.lot_id)... I forgot one could do
that:
(SELECT count(DISTINCT lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id
AND dt.days_old_start_date >= #CreateODBCDate(DateAdd("d", -
int(ListLast(variables.aging_breakdown_list, ",")), now() ))#
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
#PreserveSingleQuotes(variables.base_query)#
) AS lot_count_greater_#ListLast(variables.aging_breakdown_list, ",")#,
Note the #PreserveSingleQuotes(variables.base_query)# is dynamic code that
further selects deficiencies by various criteria, eg just for a particular
supplier.

This query is actually dynamic, if all I had to do was the above 2 clauses
then I most certainly COULD do a group by.

However, for the total deficiencies I am then splitting up the total into
aging groups, eg <30, 30-60, 60-90, and >90 days old. The query for that
looks like the below. But before I paste it in, I would like to optimize
it, if I could do so with a group by clause I most certainly would, but I
don't see how I can BECAUSE OF THE AGING BREAKDOWN:

SELECT project_id, marketing_name,
(SELECT count(lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS def_count,

(SELECT count(lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id

AND dt.days_old_start_date < {d '2002-10-07'}
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS def_count_less_30,

(SELECT count(lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id

AND dt.days_old_start_date >= {d '2002-10-07'}
AND dt.days_old_start_date < {d '2002-09-07'}
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS def_count_30_60,

(SELECT count(lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id

AND dt.days_old_start_date >= {d '2002-09-07'}
AND dt.days_old_start_date < {d '2002-08-08'}
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS def_count_60_90,

(SELECT count(lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id

AND dt.days_old_start_date >= {d '2002-08-08'}
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS def_count_greater_90,

(SELECT count(DISTINCT lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id

AND dt.days_old_start_date < {d '2002-10-07'}
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS lot_count_less_30,

(SELECT count(DISTINCT lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id

AND dt.days_old_start_date >= {d '2002-10-07'}
AND dt.days_old_start_date < {d '2002-09-07'}
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS lot_count_30_60,

(SELECT count(DISTINCT lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id

AND dt.days_old_start_date >= {d '2002-09-07'}
AND dt.days_old_start_date < {d '2002-08-08'}
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS lot_count_60_90,

(SELECT count(DISTINCT lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id

AND dt.days_old_start_date >= {d '2002-08-08'}
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS lot_count_greater_90,

(SELECT count(DISTINCT lots.lot_id) AS lot_count
FROM deficiency_table AS dt, lots, deficiency_status AS ds
WHERE dt.lot_id = lots.lot_id
AND lots.division_id = proj.division_id
AND lots.project_id = proj.project_id
AND dt.deficiency_status_id = ds.deficiency_status_id
AND ds.is_outstanding
AND dt.assigned_supplier_id = '101690'
) AS lot_count
FROM projects AS proj
WHERE proj.division_id = 'GGH'
AND NOT EXISTS (SELECT 1 FROM menu_group_projects WHERE menu_code = 'WA'
AND division_id = proj.division_id AND project_id = proj.project_id AND
status = 'I')
ORDER BY proj.project_id

If anyone can see a way to do a group by to do this, then I will be happy to
hear about it, because currently the resultset has to do a separate
(sequential or index) scan of the deficiencies table. The only way I can
see to do a group by would be to break out the aging categories into
separate queries, but that wins me nothing because each query then does its
own scan...

The expected simplified output of this query looks like this:
Project <30 30-60 >=60 lot total <30 30-60 >=60 def total
X 1 2 1 4 5 10 5 20 (if X had 4 lots, each of 5 deficiencies)
Y 1 1 0 2 3 3 0 6 (each has eg 3 deficiencies in project Y)

Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry(at)greatgulfhomes(dot)com

> -----Original Message-----
> From: ch(at)rodos(dot)fzk(dot)de [mailto:ch(at)rodos(dot)fzk(dot)de]
> Sent: Wednesday, November 06, 2002 4:54 AM
> To: terry(at)ashtonwoodshomes(dot)com
> Subject: Re: [SQL] query optimization question
>
>
>
> Dear Terry,
> When I was reading the objective of your query, I expected at
> least one
> GROUP BY clause within. I do not intend to be offensive - not at all,
> but your query very much looks like you're lacking in basic SQL
> knowledge
> (did you receive any other reply?).
> The clause
> WHERE ...
> AND division_id = proj.division_id AND project_id =
> proj.project_id ...
>
> is leading to a JOIN of your projects table to itself.
> I'm pretty sure that's the main reason why the query is slow.
> As I understand your database table design, there are relations about
> divisions, projects, lots, and deficiencies of lots. And you are
> running a master database for all of them.
> I've tried to write two queries (see below) to retrieve the
> information
> you want (BTW I think your first subquery counts the total number of
> lots within the project but not the total number of deficiencies).
> Both queries may still run slow because three tables have to be joined
> (Please try them within the 'psql' interactive terminal first).
> Also, they may not work at all (I could not verify them as I did not
> know about your CREATE TABLE statements and did not have data to put
> in).
> I'm willing to help, so if it's not working this information would be
> very useful to me. I am no SQL guru, so I cannot see any way to put
> these
> two into one. But this looks like an interesting task, maybe we should
> put this topic to the list again as soon as we make the
> single ones run.
>
> Probably, you'll need to create several indexes to speed up.
>
> -- for each project, the total number of deficiencies
> SELECT p.project_id, p.marketing_name, COUNT(d.lot_id) AS def_count
> FROM projects AS p,
> lots AS l LEFT JOIN deficiency_table AS d
> ON ( d.lot_id = l.lot_id )
> WHERE l.division_id = p.division_id
> AND p.division_id = '#variables.local_division_id#'
> GROUP BY p.project_id, p.marketing_name ;
>
> -- for each project, the total number of lots with 1 or more
> deficiencies
> SELECT p.project_id, p.marketing_name, COUNT(l.lot_id) AS
> def_lot_count
> FROM projects AS p,
> lots AS l LEFT JOIN deficiency_table AS d
> ON ( d.lot_id = l.lot_id )
> WHERE l.division_id = p.division_id
> AND p.division_id = '#variables.local_division_id#'
> GROUP BY p.project_id, p.marketing_name HAVING COUNT(d.lot_id) > 0 ;
>
> Once again, no offence intended, but I recommend to read a book on SQL
> soon.
>
> Regards, Christoph
>
> >
> > The query below is slow because both the lots table and the
> deficiency_table
> > table have thousands of records. Can anyone tell me how to do the
> second
> > subselect (lot_count) by some method of a join instead of a sub -
> subselect
> > OR any other method I can use to optimize this query to make it
> faster?
> >
> > The objective of the query is: Tell me for each project, the total
> number
> > of deficiencies in the project, and the total number of
> lots with 1 or
> more
> > deficiencies in the project.
> >
> > SELECT project_id, marketing_name,
> > (SELECT COUNT(lots.lot_id) AS lot_count
> > FROM deficiency_table AS dt, lots
> > WHERE dt.lot_id = lots.lot_id
> > AND lots.division_id = proj.division_id
> > AND lots.project_id = proj.project_id
> > ) AS def_count,
> > (SELECT COUNT(lots.lot_id) AS lot_counter
> > FROM lots
> > WHERE lots.division_id = proj.division_id
> > AND lots.project_id = proj.project_id
> > AND EXISTS (SELECT 1 FROM deficiency_table AS dt WHERE
> dt.lot_id =
> > lots.lot_id)
> > ) AS lot_count
> > FROM projects AS proj
> > WHERE proj.division_id = '#variables.local_division_id#'
> > AND NOT EXISTS (SELECT 1 FROM menu_group_projects WHERE menu_code =
> 'WA'
> > AND division_id = proj.division_id AND project_id = proj.project_id
> AND
> > status = 'I')
> > ORDER BY proj.project_id
> >
> > Thanks in advance
> >
> > Terry Fielder
> > Network Engineer
> > Great Gulf Homes / Ashton Woods Homes
> > terry(at)greatgulfhomes(dot)com
> >
>

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Achilleus Mantzios 2002-11-06 14:10:19 Re: [SQL] Problem: Referential Integrity Constraints lost: Correction
Previous Message terry 2002-11-06 13:56:40 Re: query optimization question