I am using postgresql as database for a hibernate based java oltp
project and as in previous projects am totally impressed by
postgresql's robustness, performance and feature-richness. Thanks for
this excellent piece of software.
Quite often Hibernate ends up generating queries with a lot of joins
which usually works well, except for queries which load some
additional data based on a previous query (SUBSELECT collections),
which look like:
select ..... from table1 ... left outer join table 15 .... WHERE
table1.id IN (select id .... join table16 ... join table20 WHERE
Starting with some amount of joins, the optimizer starts to do quite
suboptimal things like hash-joining huge tables where selctivity would
I already raised join_collapse_limit and from_collapse_limit, but
after a certain point query planning starts to become very expensive.
However, when using " =ANY(ARRAY(select ...))" instead of "IN" the
planner seems to do a lot better, most likely because it treats the
subquery as a black-box that needs to be executed independently. I've
hacked hibernate a bit to use ANY+ARRAY, and it seems to work a lot
better than using "IN".
However, I am a bit uncertain:
- Is it safe to use ANY(ARRAY(select ...)) when I know the sub-query
will only return a small amount (0-100s) of rows?
- Shouldn't the optimizer be a bit smarter avoiding optimizing this
case in the first place, instead of bailing out later? Should I file a
bug-report about this problem?
Thank you in advance, Clemens
pgsql-performance by date
|Next:||From: Tom Lane||Date: 2012-05-01 14:43:58|
|Subject: Re: Any disadvantages of using =ANY(ARRAY()) instead of IN? |
|Previous:||From: Merlin Moncure||Date: 2012-05-01 13:43:00|
|Subject: Re: Tuning Postgres 9.1 on Windows|