Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?

From: Clemens Eisserer <linuxhippy(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?
Date: 2012-05-13 14:35:30
Message-ID: CAFvQSYTwLqvPOF9iLycrAsCq0fNJ8bfOjPvbbZhc70maA=TpzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Noah,

Thanks a lot for your feedback and explanations.

> Since you have 15+ tables at the top level, the genetic query optimizer should
> be kicking in and delivering a plan in reasonable time, albeit with plan
> quality hazards.  There's a danger zone when the deterministic planner is
> still in effect but {from,join}_collapse_limit have limited the scope of its
> investigation.  If you're in that zone and have not hand-tailored your
> explicit join order, poor plans are unsurprising.  What exact configuration
> changes are you using?

Basically only the changes, suggested here a year ago, which made the
problem go away for less complex queries:

geqo_threshold = 20
from_collapse_limit = 13
join_collapse_limit = 13

> Hundreds of rows, no.  Consider this example:
> IN(...):
>  Total runtime: 2200.767 ms
>
> ANY(ARRAY(...)):
>  Total runtime: 11748.348 ms

In case there is an index on C, the resulting index scan is, even with
1000 elements, 3 times faster on my Notebook.
However, both queries execute in next-to-no time (15 vs 5ms).

> Filing a bug report with the content you've already posted would not add much,
> but a self-contained test case could prove useful.  Many of the deficiencies
> that can make ANY(ARRAY(...)) win do represent unimplemented planner
> intelligence more than bugs.
>
> Incidentally, you can isolate whether ANY(ARRAY(...))'s advantage comes solely
> from suppressing the subquery collapse.  Keep "IN" but tack "OFFSET 0" onto
> the subquery.  If this gives the same performance as ANY(ARRAY(...)), then the
> subquery-collapse suppression was indeed the source of advantage.

I see your point, some dumb logic to replace IN with ANY(ARRAY
wouldn't always yield better results.
I'll try to come up with a self-containing testcase.

Thanks again, Clemens

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2012-05-13 16:01:06 Re: Maximum number of sequences that can be created
Previous Message Robert Klemme 2012-05-13 10:56:20 Re: Maximum number of sequences that can be created