Why is PostgreSQL not using my index?

From: "Christian Roche" <Christian(dot)Roche(at)workshare(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Why is PostgreSQL not using my index?
Date: 2015-01-26 16:32:22
Message-ID: dd25f397d7424a50b4c548a435d6847d@LN1-EX2013-01.workshare.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi guys,

Can I take a jab at the celebrated “why is Postgres not using my index” riddle?

I’m using PostgreSQL 9.3.3 on an Amazon RDS “db.r3.xlarge” 64-bit instance. I have two tables, one with about 30M rows and two indexes (in fact a monthly partition):

CREATE TABLE staging.mixpanel_events_201409 (
date_day date NOT NULL,
event_id int NOT NULL REFERENCES mixpanel_event_list,
prop hstore
);

CREATE INDEX mixpanel_idx_date_201409
ON mixpanel_events_201409
USING btree
(date_day);

CREATE INDEX mixpanel_idx_event_201409
ON mixpanel_events_201409
USING btree
(event_id);

And a lookup table with about 600 rows:

CREATE TABLE staging.mixpanel_event_list (
id serial PRIMARY KEY,
name text UNIQUE,
source event_source NULL
);

Now when I select a subset of the possible event IDs in the big table, PG uses the appropriate index:

select *
from mixpanel_events_201409
where event_id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);

Bitmap Heap Scan on mixpanel_events_201409 (cost=7663.36..1102862.70 rows=410022 width=949)
Recheck Cond: (event_id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))
-> Bitmap Index Scan on mixpanel_idx_event_201409 (cost=0.00..7560.85 rows=410022 width=0)
Index Cond: (event_id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))

But when I try to join the lookup table and select from it, the index is dismissed for a full table scan with a catastrophic effect on performance:

select *
from mixpanel_events_201409 mp
inner join mixpanel_event_list ev on ( ev.id = mp.event_id )
where ev.id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);

Hash Join (cost=20.73..2892183.32 rows=487288 width=1000)
Hash Cond: (mp.event_id = ev.id)
-> Seq Scan on mixpanel_events_201409 mp (cost=0.00..2809276.70 rows=20803470 width=949)
-> Hash (cost=20.57..20.57 rows=13 width=51)
-> Seq Scan on mixpanel_event_list ev (cost=0.00..20.57 rows=13 width=51)
Filter: (id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))

Both tables have been vacuum analyzed.

What gives?

Thanks a lot for your help,
Chris

This email is from Workshare Limited. The information contained in and accompanying this communication may be confidential, subject to legal privilege, or otherwise protected from disclosure, and is intended solely for the use of the intended recipient(s). If you are not the intended recipient of this communication, please delete and destroy all copies in your possession and note that any review or dissemination of, or the taking of any action in reliance on, this communication is expressly prohibited. Please contact the sender if you believe you have received this email in error. Workshare Limited is a limited liability company registered in England and Wales (registered number 3559880), its registered office is at 20 Fashion Street, London, E1 6PX for further information, please refer to http://www.workshare.com.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2015-01-26 16:51:00 Re: How to tell ANALYZE to collect statistics from the whole table?
Previous Message Marc Mamin 2015-01-25 21:07:06 Re: Query performance