On Saturday, August 9, 2003, at 11:20 AM, Josh Berkus wrote:
> IME, NOT EXISTS works better than NOT IN on all 7.1 --> 7.3 for any
> result which is consistently 20 values or more.
Ah, I actually expect this query to generally return _no_ records, and
to return fewer than 10 when it does return results. If I'm reading
what you're saying correctly, then I'm probably better off with the NOT
IN, and it will work great on 7.4, anyway, yes?
> HOWEVER, the subquery estimate for EXISTS clauses up to 7.4 is a hack
> pretty much the planner assumes that there will be 50% correlation,
> which can
> often result in an unnecessary seq scan (mind you, NOT IN is no better
> large lists). So it pays to "overdetermine" your criteria in the main
> that is, add WHERE clauses to the main query even where they are
> with the subquery to give the planner the right idea.
So, if that's true for NOT IN, as well, then perhaps I should do
something like this?
SELECT r.id, r.parent_id, r.path, r.uri, r.size, r.mod_time
FROM resource r, media_type t, job__resource jr
WHERE r.media_type__id = t.id
AND r.id = jr.resource__id
AND r.id NOT IN
WHERE job__id = 1027)
ORDER BY path
So I've added the join and of course its required WHERE statement. It's
redundant, as you say, but the NOT IN should ensure it still returns
the correct records. Is this what you meant?
Stephan and Josh, I appreciate your advice!
David Wheeler AIM: dwTheory
david(at)kineticode(dot)com ICQ: 15726394
http://www.kineticode.com/ Yahoo!: dew7e
Kineticode. Setting knowledge in motion.[sm]
In response to
sfpug by date
|Next:||From: Sean Chittenden||Date: 2003-08-12 03:50:32|
|Subject: Re: optimizing selects on time-series data in Pg|
|Previous:||From: Josh Berkus||Date: 2003-08-09 18:20:06|
|Subject: Re: NOT IN Optmization|