Skip site navigation (1) Skip section navigation (2)

merge join killing performance

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: merge join killing performance
Date: 2010-05-19 00:17:33
Message-ID: AANLkTikmwxQJIDfefCoK_SQN2DTwQs1naLwS-RW8PgRU@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Machine: 8 core AMD opteron 2.1GHz, 12 disk RAID-10, 2 disk pg_xlog,
RHEL 5.4 pg version 8.3.9 (upgrading soon to 8.3.11 or so)

This query:
SELECT sum(f.bytes) AS sum FROM files f INNER JOIN events ev ON f.eid
= ev.eid WHERE ev.orgid = 969677;

is choosing a merge join, which never returns from explain analyze (it
might after 10 or so minutes, but I'm not beating up my production
server over it)

 Aggregate  (cost=902.41..902.42 rows=1 width=4)
   ->  Merge Join  (cost=869.97..902.40 rows=1 width=4)
         Merge Cond: (f.eid = ev.eid)
         ->  Index Scan using files_eid_idx on files f
(cost=0.00..157830.39 rows=3769434 width=8)
         ->  Sort  (cost=869.52..872.02 rows=1002 width=4)
               Sort Key: ev.eid
               ->  Index Scan using events_orgid_idx on events ev
(cost=0.00..819.57 rows=1002 width=4)
                     Index Cond: (orgid = 969677)


If I turn off mergejoin it's fast:

explain analyze SELECT sum(f.bytes) AS sum FROM files f INNER JOIN
events ev ON f.eid = ev.eid WHERE ev.orgid = 969677;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3653.28..3653.29 rows=1 width=4) (actual
time=1.541..1.541 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..3653.28 rows=1 width=4) (actual
time=1.537..1.537 rows=0 loops=1)
         ->  Index Scan using events_orgid_idx on events ev
(cost=0.00..819.57 rows=1002 width=4) (actual time=0.041..0.453
rows=185 loops=1)
               Index Cond: (orgid = 969677)
         ->  Index Scan using files_eid_idx on files f
(cost=0.00..2.82 rows=1 width=8) (actual time=0.005..0.005 rows=0
loops=185)
               Index Cond: (f.eid = ev.eid)
 Total runtime: 1.637 ms

I've played around with random_page_cost.  All the other things you'd
expect, like effective_cache_size are set rather large (it's a server
with 32Gig ram and a 12 disk RAID-10) and no setting of
random_page_cost forces it to choose the non-mergejoin plan.

Anybody with any ideas, I'm all ears.

Responses

pgsql-performance by date

Next:From: Matthew WakelingDate: 2010-05-19 03:00:18
Subject: Re: merge join killing performance
Previous:From: Carlo StonebanksDate: 2010-05-18 22:28:25
Subject: Does FILTER in SEQSCAN short-circuit AND?

pgsql-hackers by date

Next:From: Fujii MasaoDate: 2010-05-19 02:40:19
Subject: Re: Stefan's bug (was: max_standby_delay considered harmful)
Previous:From: Florian PflugDate: 2010-05-19 00:15:42
Subject: Re: Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group