Skip site navigation (1) Skip section navigation (2)

Re: IN operator causes sequential scan (vs. multiple OR expressions)

From: Ryan Holmes <ryan(at)hyperstep(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: IN operator causes sequential scan (vs. multiple OR expressions)
Date: 2007-01-28 01:34:12
Message-ID: 4F003DBB-34E7-4E9D-A8B3-27AA0CBD66C3@hyperstep.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Jan 27, 2007, at 3:53 PM, Tom Lane wrote:

> Ryan Holmes <ryan(at)hyperstep(dot)com> writes:
>> I have a relatively simple query where the planner chooses a
>> sequential scan when using the IN operator but chooses an index scan
>> when using logically equivalent multiple OR expressions.
>
> EXPLAIN ANALYZE for both, please?
>
> If you set enable_seqscan = off, does that force an indexscan, and  
> if so
> what does EXPLAIN ANALYZE show in that case?
>
> 			regards, tom lane

Wow, I didn't expect such a quick response -- thank you!
Note: I rebuilt the state column index and ran a VACUUM ANALYZE since  
my original post, so the planner's "rows" estimate is now different  
than the 6830 I mentioned. The planner estimate is actually *less*  
accurate now, but still in the ballpark relatively speaking.

Here is the EXPLAIN ANALYZE for both queries with enable_seqscan = on :

EXPLAIN ANALYZE SELECT * FROM pool_sample ps
WHERE ps.state = 'PENDING_REPOOL_REVIEW' OR ps.state =  
'READY_FOR_REPOOL';

Bitmap Heap Scan on pool_sample ps  (cost=985.51..61397.50 rows=50022  
width=539) (actual time=13.560..39.377 rows=518 loops=1)
   Recheck Cond: (((state)::text = 'PENDING_REPOOL_REVIEW'::text) OR  
((state)::text = 'READY_FOR_REPOOL'::text))
   ->  BitmapOr  (cost=985.51..985.51 rows=50084 width=0) (actual  
time=9.628..9.628 rows=0 loops=1)
         ->  Bitmap Index Scan on  
"idx_poolSample_state"  (cost=0.00..480.25 rows=25042 width=0)  
(actual time=0.062..0.062 rows=4 loops=1)
               Index Cond: ((state)::text =  
'PENDING_REPOOL_REVIEW'::text)
         ->  Bitmap Index Scan on  
"idx_poolSample_state"  (cost=0.00..480.25 rows=25042 width=0)  
(actual time=9.563..9.563 rows=514 loops=1)
               Index Cond: ((state)::text = 'READY_FOR_REPOOL'::text)
Total runtime: 39.722 ms


EXPLAIN ANALYZE SELECT * FROM pool_sample ps
WHERE ps.state IN ('PENDING_REPOOL_REVIEW', 'READY_FOR_REPOOL');

Seq Scan on pool_sample ps  (cost=0.00..331435.92 rows=9667461  
width=539) (actual time=1060.472..47584.542 rows=518 loops=1)
   Filter: ((state)::text = ANY  
(('{PENDING_REPOOL_REVIEW,READY_FOR_REPOOL}'::character varying 
[])::text[]))
Total runtime: 47584.698 ms



And now with enable_seqscan = off:

EXPLAIN ANALYZE SELECT * FROM pool_sample ps
WHERE ps.state = 'PENDING_REPOOL_REVIEW' OR ps.state =  
'READY_FOR_REPOOL';

Bitmap Heap Scan on pool_sample ps  (cost=985.51..61397.50 rows=50022  
width=539) (actual time=0.324..0.601 rows=518 loops=1)
   Recheck Cond: (((state)::text = 'PENDING_REPOOL_REVIEW'::text) OR  
((state)::text = 'READY_FOR_REPOOL'::text))
   ->  BitmapOr  (cost=985.51..985.51 rows=50084 width=0) (actual  
time=0.287..0.287 rows=0 loops=1)
         ->  Bitmap Index Scan on  
"idx_poolSample_state"  (cost=0.00..480.25 rows=25042 width=0)  
(actual time=0.109..0.109 rows=4 loops=1)
               Index Cond: ((state)::text =  
'PENDING_REPOOL_REVIEW'::text)
         ->  Bitmap Index Scan on  
"idx_poolSample_state"  (cost=0.00..480.25 rows=25042 width=0)  
(actual time=0.176..0.176 rows=514 loops=1)
               Index Cond: ((state)::text = 'READY_FOR_REPOOL'::text)
Total runtime: 0.779 ms


EXPLAIN ANALYZE SELECT * FROM pool_sample ps
WHERE ps.state IN ('PENDING_REPOOL_REVIEW', 'READY_FOR_REPOOL');

Bitmap Heap Scan on pool_sample ps  (cost=150808.51..467822.04  
rows=9667461 width=539) (actual time=0.159..0.296 rows=518 loops=1)
   Recheck Cond: ((state)::text = ANY  
(('{PENDING_REPOOL_REVIEW,READY_FOR_REPOOL}'::character varying 
[])::text[]))
   ->  Bitmap Index Scan on  
"idx_poolSample_state"  (cost=0.00..148391.65 rows=9667461 width=0)  
(actual time=0.148..0.148 rows=518 loops=1)
         Index Cond: ((state)::text = ANY  
(('{PENDING_REPOOL_REVIEW,READY_FOR_REPOOL}'::character varying 
[])::text[]))
Total runtime: 0.445 ms



So, yes, disabling seqscan does force an index scan for the IN  
version. My question now is, how do I get PostgreSQL to make the  
"right" decision without disabling seqscan?

Here are the non-default resource usage and query tuning settings  
from postgresql.conf:
shared_buffers = 512MB
work_mem = 6MB
maintenance_work_mem = 256MB
random_page_cost = 3.0
effective_cache_size = 1536MB
from_collapse_limit = 12
join_collapse_limit = 12

The server has 4GB RAM, 2 X 2.4GHz Opteron dual core procs, 5 x 15k  
RPM disks in a RAID 5 array and runs Windows Server 2003 x64.

Thanks,
-Ryan

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2007-01-28 01:56:39
Subject: Re: IN operator causes sequential scan (vs. multiple OR expressions)
Previous:From: Tom LaneDate: 2007-01-27 23:53:06
Subject: Re: IN operator causes sequential scan (vs. multiple OR expressions)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group