Quick Links

Any disadvantages of using =ANY(ARRAY()) instead of IN?

From:	Clemens Eisserer <linuxhippy(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Any disadvantages of using =ANY(ARRAY()) instead of IN?
Date:	2012-05-01 14:34:10
Message-ID:	CAFvQSYTNtqmd5F9ZWbmxGwm2A4UGoanSuD3p=0f+yzFwpKcq6g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi,

I am using postgresql as database for a hibernate based java oltp
project and as in previous projects am totally impressed by
postgresql's robustness, performance and feature-richness. Thanks for
this excellent piece of software.

Quite often Hibernate ends up generating queries with a lot of joins
which usually works well, except for queries which load some
additional data based on a previous query (SUBSELECT collections),
which look like:

select ..... from table1 ... left outer join table 15 .... WHERE
table1.id IN (select id .... join table16 ... join table20 WHERE
table20.somevalue=?)

Starting with some amount of joins, the optimizer starts to do quite
suboptimal things like hash-joining huge tables where selctivity would
very low.
I already raised join_collapse_limit and from_collapse_limit, but
after a certain point query planning starts to become very expensive.

However, when using " =ANY(ARRAY(select ...))" instead of "IN" the
planner seems to do a lot better, most likely because it treats the
subquery as a black-box that needs to be executed independently. I've
hacked hibernate a bit to use ANY+ARRAY, and it seems to work a lot
better than using "IN".

However, I am a bit uncertain:
- Is it safe to use ANY(ARRAY(select ...)) when I know the sub-query
will only return a small amount (0-100s) of rows?
- Shouldn't the optimizer be a bit smarter avoiding optimizing this
case in the first place, instead of bailing out later? Should I file a
bug-report about this problem?

Thank you in advance, Clemens

Responses

Re: Any disadvantages of using =ANY(ARRAY()) instead of IN? at 2012-05-01 14:43:58 from Tom Lane
Re: Any disadvantages of using =ANY(ARRAY()) instead of IN? at 2012-05-10 08:20:39 from Noah Misch

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2012-05-01 14:43:58	Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?
Previous Message	Merlin Moncure	2012-05-01 13:43:00	Re: Tuning Postgres 9.1 on Windows