Re: Very large IN-clause is slow, but how to rewrite it?

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Richard Jones <rich(at)annexia(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Very large IN-clause is slow, but how to rewrite it?
Date: 2007-02-25 14:40:54
Message-ID: Pine.LNX.4.64.0702251740260.400@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Richard,

contrib/intarray may help you.

On Sun, 25 Feb 2007, Richard Jones wrote:

> I've been profiling a PG database / mix of applications and found that
> one statement which takes a very long time to execute is:
>
> select e.keywordid, e.quantity, e.max_cpc, i.position
> from bid3_events_impressions i, bid3_events e
> where i.eventid = e.id and e.keywordid in ($1,$2,$3,$4,$5,$6,$7,
> $8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,
> $27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,
> $45,$46,$47,$48,
> [... placeholders $49 thru $1908 omitted ...]
> $1909,$1910,$1911,$1912,$1913,$1914,$1915,$1916,$1917,$1918,$1919,$1920,
> $1921,$1922,$1923,$1924,$1925,$1926,$1927,$1928,$1929,$1930,$1931,$1932)
> order by e.keywordid, e.creativeid, e.t
>
> Needless to say this statement is being generated programatically.
>
> The problem is that the code needs to execute selects of this sort on
> various different number of keyword IDs quite frequently. I'm not
> sure how to rewrite it. If I put the keyword IDs into a temporary
> table then it is not at all clear that the overhead of doing each
> individual INSERT to populate the table won't be just as slow (the
> database is located across a network so there is a significant RTT,
> and COPY isn't supported by my PG lib).
>
> Has anyone got any suggestions?
>
> Rich.
>
> ----------------------------------------------------------------------
> explain select e.keywordid, e.quantity, e.max_cpc, i.position from bid3_events_impressions i, bid3_events e where i.eventid = e.id and e.keywordid in (1,2,3,4,5,6,7,8,9,10) order by e.keywordid, e.creativeid, e.t; QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=15795.79..15796.57 rows=312 width=34)
> Sort Key: e.keywordid, e.creativeid, e.t
> -> Hash Join (cost=11623.58..15782.87 rows=312 width=34)
> Hash Cond: ("outer".eventid = "inner".id)
> -> Seq Scan on bid3_events_impressions i (cost=0.00..3471.78 rows=136878 width=8)
> -> Hash (cost=11622.35..11622.35 rows=489 width=34)
> -> Seq Scan on bid3_events e (cost=0.00..11622.35 rows=489 width=34)
> Filter: ((keywordid = 1) OR (keywordid = 2) OR (keywordid = 3) OR (keywordid = 4) OR (keywordid = 5) OR (keywordid = 6) OR (keywordid = 7) OR (keywordid = 8) OR (keywordid = 9) OR (keywordid = 10))
> (8 rows)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2007-02-25 16:56:31 Re: Very large IN-clause is slow, but how to rewrite it?
Previous Message Richard Jones 2007-02-25 12:39:14 Very large IN-clause is slow, but how to rewrite it?