Re: hash semi join caused by "IN (select ...)"

From: Dave Johansen <davejohansen(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: hash semi join caused by "IN (select ...)"
Date: 2011-05-18 14:00:50
Message-ID: BANLkTin6+VVuZHtQMS3WmS3+qm+o1GR5aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 18, 2011 at 1:46 AM, Clemens Eisserer <linuxhippy(at)gmail(dot)com>wrote:

> Hi,
>
> Does anybody know why the planner treats "= ANY(ARRAY(select ...))"
> differently than "IN(select ...)"?
> Which one is preferable, when I already have a lot of joins?
>
> Thanks, Clemens
>
> 2011/5/17 Clemens Eisserer <linuxhippy(at)gmail(dot)com>:
> > Hi,
> >
> >>> select .... from t1 left join t2 .... WHERE id IN (select ....)
> >>
> >> Does it work as expected with one less join? If so, try increasing
> >> join_collapse_limit ...
> >
> > That did the trick - thanks a lot. I only had to increase
> > join_collapse_limit a bit and now get an almost perfect plan.
> > Instead of hash-joining all the data, the planner generates
> > nested-loop-joins with index only on the few rows I fetch.
> >
> > Using = ANY(array(select... )) also seems to work, I wonder which one
> > works better. Does ANY(ARRAY(...)) force the optimizer to plan the
> > subquery seperated from the main query?
> >
> > Thanks, Clemens
> >
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

I'm just a user so I don't have definitive knowledge of this, but my
experience seems to indicate that the = ANY(ARRAY(SELECT ...)) does the
select and turns it into an array and then uses that in the where clause in
a manner similar to a hard coded list of values, like IN (1, 2, 3, ...). In
theory, the planner could do the same sort of things with the IN (SELECT
...) but my experience seems to indicate that in some cases it decides not
to use an index that it could.

One specific example I know of is that at least in PostgreSQL 8.3, a view
with a UNION/UNION ALL will push the = ANY(ARRAY(SELECT ...)) down into the
two sub-queries, but the IN (SELECT ...) will be applied after the UNION
ALL.

Dave

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Johansen 2011-05-18 15:26:02 LIMIT and UNION ALL
Previous Message Clemens Eisserer 2011-05-18 08:46:00 Re: hash semi join caused by "IN (select ...)"