Skip site navigation (1) Skip section navigation (2)

Re: Possibly slow query

From: "Peter Darley" <pdarley(at)kinesis-cem(dot)com>
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: "Pgsql-Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Possibly slow query
Date: 2005-01-26 15:16:25
Message-ID: PDEOIIFFBIAABMGNJAGPMEPKDLAA.pdarley@kinesis-cem.com (view raw or flat)
Thread:
Lists: pgsql-performance
Richard,
	I tried a left join, which has to be a little weird, because there may or
may not be a corresponding row in Assignment_Settings for each Assignment,
and they may or may not have Setting='Status', so I came up with:

SELECT User_ID
FROM Assignments A NATURAL LEFT JOIN (SELECT * FROM Assignment_Settings
WHERE Setting='Status') ASet
WHERE A.User_ID IS NOT NULL
	AND ASet.Assignment_ID IS NULL
GROUP BY User_ID;

	Which explain analyze is saying takes 0.816 ms as compared to 0.163 ms for
my other query.  So, I'm not sure that I'm writing the best LEFT JOIN that I
can.  Also, I suspect that these ratios wouldn't hold as the data got bigger
and started using indexes, etc.  I'll mock up a couple of tables with a
bunch of data and see how things go.  It would be nice to understand WHY I
get the results I get, which I'm not sure I will.

	I'm not sure what you mean by selecting a distinct User_ID first.  Since
I'm joining the tables on Assignment_ID, I'm not sure how I'd do a distinct
before the join (because I'd lose Assignment_ID).  I was also under the
impression that group by was likely to be faster than a distinct, tho I
can't really recall where I got that idea from.

Thanks for your suggestions!
Peter Darley

-----Original Message-----
From: Richard Huxton [mailto:dev(at)archonet(dot)com]
Sent: Wednesday, January 26, 2005 1:36 AM
To: Peter Darley
Cc: Pgsql-Performance
Subject: Re: [PERFORM] Possibly slow query


Peter Darley wrote:
> Folks,
>
> 	I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out weather
a
> query I have is going to be slow when I have more information in my
tables.
> both tables involved will likely have ~500K rows within a year or so.
>
> 	Specifically I can't tell if I'm causing myself future problems with the
> subquery, and should maybe re-write the query to use a join.  The reason I
> went with the subquery is that I don't know weather a row in Assignments
> will have a corresponding row in Assignment_Settings
>
> 	The query is:
> SELECT User_ID
> FROM Assignments A
> WHERE A.User_ID IS NOT NULL
> 	AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND
> Assignment_ID=A.Assignment_ID) IS NULL
> GROUP BY User_ID;

You could always use a LEFT JOIN instead, like you say. I'd personally
be tempted to select distinct user_id's then join, but it depends on how
many of each.

You're not going to know for sure whether you'll have problems without
testing. Generate 500k rows of plausible looking test-data and give it a
try.

--
   Richard Huxton
   Archonet Ltd


In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2005-01-26 15:17:43
Subject: Re: PostgreSQL clustering VS MySQL clustering
Previous:From: Richard HuxtonDate: 2005-01-26 13:57:00
Subject: Re: [SQL] OFFSET impact on Performance???

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group