Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Kevin Goess *EXTERN*" <kgoess(at)bepress(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)
Date: 2012-03-19 08:24:37
Message-ID: D960CB61B694CF459DCFB4B0128514C207A2AC0D@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kevin Goess wrote:
> We have a table "contexts" with 1.6 million rows, and a table
"articles" with 1.4 million rows, where
> an "article" is a particular kind of "context". We want to select
from a join on those two tables
> like this
>
> SELECT COUNT(*)
> FROM contexts
> JOIN articles ON (articles.context_key=contexts.context_key)
> WHERE contexts.context_key IN (...);
> /* and some combination of columns from articles and contexts */
>
> If "IN(...)" is a query, then this guy does a seq scan on the contexts
table, even if the subquery is
> "select col_a from kgtest" where kgtest has one row. If however I
read the ids beforehand and write
> them into the query, a la "IN (111,222,333...)", then the everything
is happy, up to at least 20,000
> values written into the sql, at which point smaller machines will take
2-5 minutes to parse the query.
>
> I can certainly write the ids inline into the SQL, but when I do that
I get the distinct impression
> that I'm Doing It Wrong. Is this expected behavior? It seems
surprising to me.
>
>
> To demonstrate:
>
> /* nothing up my sleeve */
> # select * from kgtest;
> cola
> ---------
> 1652729
> (1 row)

[...]

> /* subselect, query plan does seq scan on contexts */
[...]
> -> Seq Scan on kgtest (cost=0.00..34.00 rows=2400 width=4)
(actual time=0.048..0.050 rows
[...]

There is something missing in this line, but according to what you wrote
it must be "actual [...] rows=1", And yet the planner assumes that the
scan will return 2400 rows.
That means that your statistics are not accurate.

As a first measure, you should ANALYZE the tables involved and see if
the problem persists. If yes, post the new plans.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sergey Konoplev 2012-03-19 08:58:08 Re: Multi server query
Previous Message Janning Vygen 2012-03-19 08:09:49 Re: Anonymized database dumps