Subquery flattening causing sequential scan

From: Jim Crate <jimcfl(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Subquery flattening causing sequential scan
Date: 2011-12-27 17:29:14
Message-ID: 0E417F76-FE21-4FB5-8132-0C485778A041@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

PostgreSQL 9.0.2
Mac OS X Server 10.6.8
Autovacuum is on, and I have a script that runs vacuum analyze verbose every night along with the backup.

I have a situation where I'm experiencing a seq scan on a table with almost 3M rows when my condition is based on a subquery. A google search turned up a way to prevent flattening the subquery into a join using OFFSET 0. This does work, reducing the query from around 1s to around 250ms, most of which is the subquery.

My question is why does it do a seq scan when it flattens this subquery into a JOIN? Is it because the emsg_messages table is around 1M rows? Are there some guidelines to when the planner will prefer not to use an available index? I just had a look through postgresql.conf and noticed that I forgot to set effective_cache_size to something reasonable for a machine with 16GB of memory. Would the default setting of 128MB cause this behavior? I can't bounce the production server midday to test that change.

EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
SELECT id
FROM emsg_messages msg
WHERE msg.account_id = 314 AND msg.outgoing = FALSE
AND msg.message_type = 1 AND msg.spam_level < 2
AND msg.deleted_at IS NULL
AND msg.id NOT IN (
SELECT emf.message_id
FROM emsg_message_folders emf
where emf.account_id = 314
)
)

QUERY PLAN
Hash Semi Join (cost=84522.74..147516.35 rows=49545 width=12) (actual time=677.058..1083.685 rows=2 loops=1)
Hash Cond: (ema.message_id = msg.id)
-> Seq Scan on emsg_message_addresses ema (cost=0.00..53654.78 rows=2873478 width=12) (actual time=0.020..424.241 rows=2875437 loops=1)
-> Hash (cost=84475.45..84475.45 rows=3783 width=4) (actual time=273.392..273.392 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Bitmap Heap Scan on emsg_messages msg (cost=7979.35..84475.45 rows=3783 width=4) (actual time=273.224..273.387 rows=1 loops=1)
Recheck Cond: (account_id = 314)
Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))
-> Bitmap Index Scan on index_emsg_messages_on_account_id (cost=0.00..867.98 rows=34611 width=0) (actual time=9.633..9.633 rows=34997 loops=1)
Index Cond: (account_id = 314)
SubPlan 1
-> Bitmap Heap Scan on emsg_message_folders emf (cost=704.90..7022.51 rows=35169 width=4) (actual time=5.684..38.016 rows=34594 loops=1)
Recheck Cond: (account_id = 314)
-> Bitmap Index Scan on index_emsg_message_folders_on_account_id (cost=0.00..696.10 rows=35169 width=0) (actual time=5.175..5.175 rows=34594 loops=1)
Index Cond: (account_id = 314)
Total runtime: 1083.890 ms

EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
SELECT id
FROM emsg_messages msg
WHERE msg.account_id = 314 AND msg.outgoing = FALSE
AND msg.message_type = 1 AND msg.spam_level < 2
AND msg.deleted_at IS NULL
AND msg.id NOT IN (
SELECT emf.message_id
FROM emsg_message_folders emf
where emf.account_id = 314
)
OFFSET 0
)

QUERY PLAN
Nested Loop (cost=84524.89..87496.74 rows=2619 width=12) (actual time=273.409..273.412 rows=2 loops=1)
-> HashAggregate (cost=84524.89..84526.89 rows=200 width=4) (actual time=273.345..273.346 rows=1 loops=1)
-> Limit (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.171..273.335 rows=1 loops=1)
-> Bitmap Heap Scan on emsg_messages msg (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.169..273.333 rows=1 loops=1)
Recheck Cond: (account_id = 314)
Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1)) AND (message_type = 1))
-> Bitmap Index Scan on index_emsg_messages_on_account_id (cost=0.00..867.99 rows=34612 width=0) (actual time=9.693..9.693 rows=34998 loops=1)
Index Cond: (account_id = 314)
SubPlan 1
-> Bitmap Heap Scan on emsg_message_folders emf (cost=704.90..7022.51 rows=35169 width=4) (actual time=5.795..39.420 rows=34594 loops=1)
Recheck Cond: (account_id = 314)
-> Bitmap Index Scan on index_emsg_message_folders_on_account_id (cost=0.00..696.10 rows=35169 width=0) (actual time=5.266..5.266 rows=34594 loops=1)
Index Cond: (account_id = 314)
-> Index Scan using index_emsg_message_addresses_on_message_id on emsg_message_addresses ema (cost=0.00..14.69 rows=13 width=12) (actual time=0.056..0.058 rows=2 loops=1)
Index Cond: (ema.message_id = msg.id)
Total runtime: 273.679 ms

Jim Crate

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-12-27 18:12:46 Re: Subquery flattening causing sequential scan
Previous Message Scott Marlowe 2011-12-27 16:45:06 Re: Exploring memory usage