Re: Simple join doesn't use index

From: Alex Vinnik <alvinnik(dot)g(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Simple join doesn't use index
Date: 2013-01-29 14:41:50
Message-ID: CALd8TVEo3Gude58qVjk6EUHoWTiQ2969pfbQsugx3gGbpwruNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Setting work_mem to 64MB triggers in memory sort but look what happens with
views look up. PG goes through all records there "Seq Scan on views"
instead of using visitor_id index and I have only subset of real data to
play around. Can imagine what cost would be running it against bigger
dataset. Something else is in play here that makes planner to take this
route. Any ideas how to gain more insight into planner's inner workings?

QUERY PLAN
Sort (cost=960280.46..960844.00 rows=225414 width=8) (actual
time=23328.040..23537.126 rows=209401 loops=1)
Sort Key: visits.id, views.id
Sort Method: quicksort Memory: 15960kB
-> Hash Join (cost=8089.16..940238.66 rows=225414 width=8) (actual
time=6622.072..22995.890 rows=209401 loops=1)
Hash Cond: (views.visit_id = visits.id)
-> Seq Scan on views (cost=0.00..831748.05 rows=8724205 width=8)
(actual time=0.093..10552.306 rows=6995893 loops=1)
-> Hash (cost=6645.51..6645.51 rows=115492 width=4) (actual
time=307.389..307.389 rows=131311 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 4617kB
-> Index Scan using visits_created_at_index on visits
(cost=0.00..6645.51 rows=115492 width=4) (actual time=0.040..163.151
rows=131311 loops=1)
Index Cond: ((created_at >= '2013-01-15
00:00:00'::timestamp without time zone) AND (created_at < '2013-01-16
00:00:00'::timestamp without time zone))
Total runtime: 23733.045 ms

On Mon, Jan 28, 2013 at 8:31 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Mon, Jan 28, 2013 at 5:43 PM, Alex Vinnik <alvinnik(dot)g(at)gmail(dot)com> wrote:
> > It sure turned out that default settings are not a good fit. Setting
> > random_page_cost to 1.0 made query to run in 2.6 seconds and I clearly
> see
> > that indexes are being used in explain plan and IO utilization is close
> to
> > 0.
> >
> > QUERY PLAN
> > Sort (cost=969787.23..970288.67 rows=200575 width=8) (actual
> > time=2176.045..2418.162 rows=241238 loops=1)
> > Sort Key: visits.id, views.id
> > Sort Method: external sort Disk: 4248kB
> > -> Nested Loop (cost=0.00..950554.81 rows=200575 width=8) (actual
> > time=0.048..1735.357 rows=241238 loops=1)
> > -> Index Scan using visits_created_at_index on visits
> > (cost=0.00..5459.16 rows=82561 width=4) (actual time=0.032..178.591
> > rows=136021 loops=1)
> > Index Cond: ((created_at >= '2012-12-15
> 00:00:00'::timestamp
> > without time zone) AND (created_at < '2012-12-16 00:00:00'::timestamp
> > without time zone))
> > -> Index Scan using views_visit_id_index on views
> > (cost=0.00..11.33 rows=12 width=8) (actual time=0.004..0.006 rows=2
> > loops=136021)
> > Index Cond: (visit_id = visits.id)
> > Total runtime: 2635.169 ms
> >
> > However I noticed that sorting is done using disk("external sort Disk:
> > 4248kB") which prompted me to take a look at work_mem. But it turned out
> > that small increase to 4MB from default 1MB turns off index usage and
> query
> > gets x10 slower. IO utilization jumped to 100% from literally nothing. so
> > back to square one...
> >
> > QUERY PLAN
> > Sort (cost=936642.75..937144.19 rows=200575 width=8) (actual
> > time=33200.762..33474.443 rows=241238 loops=1)
> > Sort Key: visits.id, views.id
> > Sort Method: external merge Disk: 4248kB
> > -> Hash Join (cost=6491.17..917410.33 rows=200575 width=8) (actual
> > time=7156.498..32723.221 rows=241238 loops=1)
> > Hash Cond: (views.visit_id = visits.id)
> > -> Seq Scan on views (cost=0.00..832189.95 rows=8768395
> width=8)
> > (actual time=0.100..12126.342 rows=8200704 loops=1)
> > -> Hash (cost=5459.16..5459.16 rows=82561 width=4) (actual
> > time=353.683..353.683 rows=136021 loops=1)
> > Buckets: 16384 Batches: 2 (originally 1) Memory Usage:
> > 4097kB
> > -> Index Scan using visits_created_at_index on visits
> > (cost=0.00..5459.16 rows=82561 width=4) (actual time=0.032..175.051
> > rows=136021 loops=1)
> > Index Cond: ((created_at >= '2012-12-15
> > 00:00:00'::timestamp without time zone) AND (created_at < '2012-12-16
> > 00:00:00'::timestamp without time zone))
> > Total runtime: 33698.000 ms
> >
> > Basically PG is going through all views again and not using "Index Scan
> > using views_visit_id_index on views". Looks like setting work_mem
> confuses
> > planner somehow. Any idea what can be done to do sorting in memory. I
> > suspect it should make query even more faster. Thanks -Alex
>
> hm, what happens when you set work_mem a fair amount higher? (say,
> 64mb). You can set it for one session by going "set work_mem='64mb';
> " as opposed to the entire server in postgresql.conf.
>
> merlin
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Filip Rembiałkowski 2013-01-29 16:19:19 Re: Simple join doesn't use index
Previous Message Alex Vinnik 2013-01-29 14:24:10 Re: Simple join doesn't use index