Re: IN operator causes sequential scan (vs. multiple OR expressions)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ryan Holmes <ryan(at)hyperstep(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: IN operator causes sequential scan (vs. multiple OR expressions)
Date: 2007-01-28 01:56:39
Message-ID: 26081.1169949399@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Ryan Holmes <ryan(at)hyperstep(dot)com> writes:
> So, yes, disabling seqscan does force an index scan for the IN
> version. My question now is, how do I get PostgreSQL to make the
> "right" decision without disabling seqscan?

I pinged you before because in a trivial test case I got
indexscans out of both text and varchar cases:

regression=# create table foo (f1 text unique, f2 varchar(25) unique);
NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_f1_key" for table "foo"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_f2_key" for table "foo"
CREATE TABLE
regression=# explain select * from foo where f1 in ('foo', 'bar');
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=4.52..9.86 rows=2 width=61)
Recheck Cond: (f1 = ANY ('{foo,bar}'::text[]))
-> Bitmap Index Scan on foo_f1_key (cost=0.00..4.52 rows=2 width=0)
Index Cond: (f1 = ANY ('{foo,bar}'::text[]))
(4 rows)

regression=# explain select * from foo where f2 in ('foo', 'bar');
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=6.59..17.27 rows=10 width=61)
Recheck Cond: ((f2)::text = ANY (('{foo,bar}'::character varying[])::text[]))
-> Bitmap Index Scan on foo_f2_key (cost=0.00..6.59 rows=10 width=0)
Index Cond: ((f2)::text = ANY (('{foo,bar}'::character varying[])::text[]))
(4 rows)

But on closer inspection the second case is not doing the right thing:
notice the rowcount estimate is 10, whereas it should be only 2 because
of the unique index on f2. I poked into it and realized that in 8.2
scalararraysel() fails to deal with binary-compatible datatype cases,
instead falling back to a not-very-bright generic estimate.

I've committed a fix for 8.2.2, but in the meantime maybe you could
change your varchar column to text?

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ryan Holmes 2007-01-28 02:31:45 Re: IN operator causes sequential scan (vs. multiple OR expressions)
Previous Message Ryan Holmes 2007-01-28 01:34:12 Re: IN operator causes sequential scan (vs. multiple OR expressions)