IN operator causes sequential scan (vs. multiple OR expressions)

From: Ryan Holmes <ryan(at)hyperstep(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: IN operator causes sequential scan (vs. multiple OR expressions)
Date: 2007-01-27 23:34:56
Message-ID: AE23623B-65E0-4CC8-8F7E-3920E4B42FDC@hyperstep.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

PostgreSQL version: 8.2.1
OS: Windows Server 2003

I have a relatively simple query where the planner chooses a
sequential scan when using the IN operator but chooses an index scan
when using logically equivalent multiple OR expressions. Here is the
table structure and the two versions of the query:

CREATE TABLE pool_sample
(
id integer NOT NULL,
state character varying(25) NOT NULL,
not_pooled_reason character varying(25) NOT NULL,
"comment" character varying(255),
CONSTRAINT "pk_poolSample_id" PRIMARY KEY (id)
)
WITHOUT OIDS;

CREATE INDEX "idx_poolSample_state"
ON pool_sample
USING btree
(state);

The following query uses a sequential scan (i.e. filter) on the
"state" column and takes about 5 seconds to execute (up to ~45
seconds with an "empty" cache):
SELECT * FROM pool_sample ps
WHERE ps.state IN ('PENDING_REPOOL_REVIEW', 'READY_FOR_REPOOL');

This version of the query uses an index scan on "state" and takes
about 50 milliseconds:
SELECT * FROM pool_sample ps
WHERE ps.state = 'PENDING_REPOOL_REVIEW' OR ps.state =
'READY_FOR_REPOOL';

There are over 10 million rows in the pool_sample table and 518 rows
meet the given criteria. In the first query, the planner estimates
that nearly 10 million rows will be returned (i.e. almost all rows in
the table). In the second query, the planner estimates 6830 rows,
which seems close enough for the purposes of planning.

If I explicitly cast the state column to text, the IN query uses an
index scan and performs just as well as the multiple OR version:
SELECT * FROM pool_sample ps
WHERE ps.state::text IN ('PENDING_REPOOL_REVIEW', 'READY_FOR_REPOOL');

So it would appear that the planner automatically casts the state
column to text within an OR expression but does not perform the cast
in an IN expression.

Our SQL is generated from an O/R mapper, so it's non-trivial (or at
least undesirable) to hand tune every query like this with an
explicit type cast. The only option I've come up with is to define
the state column as text in the first place, thus avoiding the need
to cast. Would this work? Are there any other/better options?

Thanks,
-Ryan

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2007-01-27 23:53:06 Re: IN operator causes sequential scan (vs. multiple OR expressions)
Previous Message Carlos Moreno 2007-01-27 21:14:44 Re: Seqscan/Indexscan still a known issue?