Re: NOT IN Optmization

From: David Wheeler <david(at)wheeler(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, <sfpug(at)postgresql(dot)org>
Subject: Re: NOT IN Optmization
Date: 2003-08-09 20:05:08
Message-ID: C69C6242-CAA4-11D7-B37F-0003931A964A@wheeler.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sfpug

On Saturday, August 9, 2003, at 11:20 AM, Josh Berkus wrote:

> IME, NOT EXISTS works better than NOT IN on all 7.1 --> 7.3 for any
> subquery
> result which is consistently 20 values or more.

Ah, I actually expect this query to generally return _no_ records, and
to return fewer than 10 when it does return results. If I'm reading
what you're saying correctly, then I'm probably better off with the NOT
IN, and it will work great on 7.4, anyway, yes?

> HOWEVER, the subquery estimate for EXISTS clauses up to 7.4 is a hack
> ...
> pretty much the planner assumes that there will be 50% correlation,
> which can
> often result in an unnecessary seq scan (mind you, NOT IN is no better
> for
> large lists). So it pays to "overdetermine" your criteria in the main
> query;
> that is, add WHERE clauses to the main query even where they are
> reduntant
> with the subquery to give the planner the right idea.

So, if that's true for NOT IN, as well, then perhaps I should do
something like this?

SELECT r.id, r.parent_id, r.path, r.uri, r.size, r.mod_time
FROM resource r, media_type t, job__resource jr
WHERE r.media_type__id = t.id
AND r.id = jr.resource__id
AND r.id NOT IN
(SELECT resource__id
FROM job__resource
WHERE job__id = 1027)
ORDER BY path

So I've added the join and of course its required WHERE statement. It's
redundant, as you say, but the NOT IN should ensure it still returns
the correct records. Is this what you meant?

Stephan and Josh, I appreciate your advice!

Regards,

David

--
David Wheeler AIM: dwTheory
david(at)kineticode(dot)com ICQ: 15726394
http://www.kineticode.com/ Yahoo!: dew7e
Jabber: Theory(at)jabber(dot)org
Kineticode. Setting knowledge in motion.[sm]

In response to

Browse sfpug by date

  From Date Subject
Next Message Sean Chittenden 2003-08-12 03:50:32 Re: optimizing selects on time-series data in Pg
Previous Message Josh Berkus 2003-08-09 18:20:06 Re: NOT IN Optmization