Simple join doesn't use index

From: Alex Vinnik <alvinnik(dot)g(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Simple join doesn't use index
Date: 2013-01-03 22:54:10
Message-ID: CALd8TVECZDosPDZ7Cwv1gVRy1p2F4Hg+sw+240MsN36KUkt31g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi everybody,

I have implemented my first app using PG DB and thought for a minute(may be
two) that I know something about PG but below problem totally destroyed my
confidence :). Please help me to restore it.

Here is simple join query. It runs just fine on MS SQL 2008 and uses
all available indexes using even bigger overall dataset.

select visits.id, views.id
from visits join views on visits.id = views.visit_id
where visits.created_at >= '11/15/2012' and visits.created_at <
'11/16/2012'

Quick performance stat

MS SQL: 1 second, 264K rows
PG: 158 seconds, 264K rows

Explain plan from both DBs

PG QUERY PLAN
Hash Join (cost=12716.17..1101820.09 rows=248494 width=8)
Hash Cond: (views.visit_id = visits.id)
-> Seq Scan on views (cost=0.00..819136.56 rows=17434456 width=8)
-> Hash (cost=10549.16..10549.16 rows=132081 width=4)
-> Index Scan using visits_created_at_index on visits
(cost=0.00..10549.16 rows=132081 width=4)
Index Cond: ((created_at >= '2012-11-15 00:00:00'::timestamp
without time zone) AND (created_at < '2012-11-16 00:00:00'::timestamp
without time zone))

schemaname | tablename | indexname | tablespace |
indexdef

------------+-----------+---------------------------------+------------+------------------------------------------------------------------------------------------
public | views | views_pkey | |
CREATE UNIQUE INDEX views_pkey ON views USING btree (id)
public | views | views_visit_id_index | |
CREATE INDEX views_visit_id_index ON views USING btree (visit_id)

MS SQL Query plan
'11/16/2012'
|--Parallelism(Gather Streams)
|--Nested Loops(Inner Join, OUTER REFERENCES:([visits].[id],
[Expr1006]) OPTIMIZED WITH UNORDERED PREFETCH)
|--Index Seek(OBJECT:([visits].[test]),
SEEK:([visits].[created_at] >= '2012-11-15 00:00:00.000' AND
[visits].[created_at] < '2012-11-16 00:00:00.000') ORDERED FORWARD)
|--Index Seek(OBJECT:([views].[views_visit_id_index]),
SEEK:([views].[visit_id]=[raw_visits].[id]) ORDERED FORWARD)

It is clear that PG does full table scan "Seq Scan on views
(cost=0.00..819136.56 rows=17434456 width=8)"

Don't understand why PG doesn't use views_visit_id_index in that query but
rather scans whole table. One explanation I have found that when resulting
dataset constitutes ~15% of total number of rows in the table then seq scan
is used. In this case resulting dataset is just 1.5% of total number of
rows. So it must be something different. Any reason why it happens and how
to fix it?

Postgres 9.2
Ubuntu 12.04.1 LTS
shared_buffers = 4GB the rest of the settings are default ones

Thanks
-Alex

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeremy Harris 2013-01-03 23:11:40 Re: Simple join doesn't use index
Previous Message Daniel Westermann 2013-01-03 18:34:21 Re: FW: performance issue with a 2.5gb joinded table