Re: hash semi join caused by "IN (select ...)"

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Clemens Eisserer <linuxhippy(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: hash semi join caused by "IN (select ...)"
Date: 2011-05-18 17:41:40
Message-ID: C9F95085.373D6%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 5/17/11 12:38 AM, "Clemens Eisserer" <linuxhippy(at)gmail(dot)com> wrote:

>Hi,
>
>>> select .... from t1 left join t2 .... WHERE id IN (select ....)
>>
>> Does it work as expected with one less join? If so, try increasing
>> join_collapse_limit ...
>
>That did the trick - thanks a lot. I only had to increase
>join_collapse_limit a bit and now get an almost perfect plan.
>Instead of hash-joining all the data, the planner generates
>nested-loop-joins with index only on the few rows I fetch.
>
>Using = ANY(array(select... )) also seems to work, I wonder which one
>works better. Does ANY(ARRAY(...)) force the optimizer to plan the
>subquery seperated from the main query?

I'm not sure exactly what happens with ANY(ARRAY()).

I am fairly confident that the planner simply transforms an IN(select ...)
to a join, since they are equivalent.

Because "foo IN (select ...)" is just a join, it counts towards
join_collapse_limit.

>
>Thanks, Clemens
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-05-19 03:00:31 Re: reducing random_page_cost from 4 to 2 to force index scan
Previous Message Pavel Stehule 2011-05-18 15:59:47 Re: LIMIT and UNION ALL