Re: optimizing queries using IN and EXISTS

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nick Hofstede <Nick(dot)Hofstede(at)inventivegroup(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: optimizing queries using IN and EXISTS
Date: 2012-07-18 22:36:14
Message-ID: 3691.1342650974@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Nick Hofstede <Nick(dot)Hofstede(at)inventivegroup(dot)com> writes:
> I'm surprised at the difference in speed/execution plan between two logically equivalent queries, one using IN, the other using EXISTS. (At least I think they are logically equivalent)

> SELECT *
> FROM foo
> WHERE 'text6' IN (SELECT value
> FROM bar
> JOIN foo AS foo2
> ON bar.foo_ref = foo2.id
> WHERE foo2.id = foo.id)

Hm. convert_ANY_sublink_to_join() rejects subqueries that contain any
Vars of the parent query level, so the reference to foo.id prevents this
from being converted to a semijoin. However, it seems like that's
overly restrictive. I'm not sure that we could remove the test
altogether, but at least outer vars used in WHERE seem safe.

In the meantime, you can recast like this:

SELECT *
FROM foo
WHERE ('text6', id) IN (SELECT value, foo2.id
FROM bar
JOIN foo AS foo2
ON bar.foo_ref = foo2.id)

and still get a semijoin plan from an IN-style query.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Kerr 2012-07-19 01:43:28 Re: Process 11812 still waiting for ExclusiveLock on extension of relation
Previous Message Simon Riggs 2012-07-18 21:45:08 Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)