Skip site navigation (1) Skip section navigation (2)

IN operator causes sequential scan (vs. multiple OR expressions)

From: Ryan Holmes <ryan(at)hyperstep(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: IN operator causes sequential scan (vs. multiple OR expressions)
Date: 2007-01-27 23:34:56
Message-ID: AE23623B-65E0-4CC8-8F7E-3920E4B42FDC@hyperstep.com (view raw or flat)
Thread:
Lists: pgsql-performance
PostgreSQL version: 8.2.1
OS: Windows Server 2003

I have a relatively simple query where the planner chooses a  
sequential scan when using the IN operator but chooses an index scan  
when using logically equivalent multiple OR expressions. Here is the  
table structure and the two versions of the query:

CREATE TABLE pool_sample
(
   id integer NOT NULL,
   state character varying(25) NOT NULL,
   not_pooled_reason character varying(25) NOT NULL,
   "comment" character varying(255),
   CONSTRAINT "pk_poolSample_id" PRIMARY KEY (id)
)
WITHOUT OIDS;

CREATE INDEX "idx_poolSample_state"
   ON pool_sample
   USING btree
   (state);


The following query uses a sequential scan (i.e. filter) on the  
"state" column and takes about 5 seconds to execute (up to ~45  
seconds with an "empty" cache):
SELECT * FROM pool_sample ps
WHERE ps.state IN ('PENDING_REPOOL_REVIEW', 'READY_FOR_REPOOL');

This version of the query uses an index scan on "state" and takes  
about 50 milliseconds:
SELECT * FROM pool_sample ps
WHERE ps.state = 'PENDING_REPOOL_REVIEW' OR ps.state =  
'READY_FOR_REPOOL';

There are over 10 million rows in the pool_sample table and 518 rows  
meet the given criteria. In the first query, the planner estimates  
that nearly 10 million rows will be returned (i.e. almost all rows in  
the table). In the second query, the planner estimates 6830 rows,  
which seems close enough for the purposes of planning.

If I explicitly cast the state column to text, the IN query uses an  
index scan and performs just as well as the multiple OR version:
SELECT * FROM pool_sample ps
WHERE ps.state::text IN ('PENDING_REPOOL_REVIEW', 'READY_FOR_REPOOL');

So it would appear that the planner automatically casts the state  
column to text within an OR expression but does not perform the cast  
in an IN expression.

Our SQL is generated from an O/R mapper, so it's non-trivial (or at  
least undesirable) to hand tune every query like this with an  
explicit type cast. The only option I've come up with is to define  
the state column as text in the first place, thus avoiding the need  
to cast. Would this work? Are there any other/better options?

Thanks,
-Ryan





Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2007-01-27 23:53:06
Subject: Re: IN operator causes sequential scan (vs. multiple OR expressions)
Previous:From: Carlos MorenoDate: 2007-01-27 21:14:44
Subject: Re: Seqscan/Indexscan still a known issue?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group