From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | merge join killing performance |
Date: | 2010-05-19 00:17:33 |
Message-ID: | AANLkTikmwxQJIDfefCoK_SQN2DTwQs1naLwS-RW8PgRU@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Machine: 8 core AMD opteron 2.1GHz, 12 disk RAID-10, 2 disk pg_xlog,
RHEL 5.4 pg version 8.3.9 (upgrading soon to 8.3.11 or so)
This query:
SELECT sum(f.bytes) AS sum FROM files f INNER JOIN events ev ON f.eid
= ev.eid WHERE ev.orgid = 969677;
is choosing a merge join, which never returns from explain analyze (it
might after 10 or so minutes, but I'm not beating up my production
server over it)
Aggregate (cost=902.41..902.42 rows=1 width=4)
-> Merge Join (cost=869.97..902.40 rows=1 width=4)
Merge Cond: (f.eid = ev.eid)
-> Index Scan using files_eid_idx on files f
(cost=0.00..157830.39 rows=3769434 width=8)
-> Sort (cost=869.52..872.02 rows=1002 width=4)
Sort Key: ev.eid
-> Index Scan using events_orgid_idx on events ev
(cost=0.00..819.57 rows=1002 width=4)
Index Cond: (orgid = 969677)
If I turn off mergejoin it's fast:
explain analyze SELECT sum(f.bytes) AS sum FROM files f INNER JOIN
events ev ON f.eid = ev.eid WHERE ev.orgid = 969677;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3653.28..3653.29 rows=1 width=4) (actual
time=1.541..1.541 rows=1 loops=1)
-> Nested Loop (cost=0.00..3653.28 rows=1 width=4) (actual
time=1.537..1.537 rows=0 loops=1)
-> Index Scan using events_orgid_idx on events ev
(cost=0.00..819.57 rows=1002 width=4) (actual time=0.041..0.453
rows=185 loops=1)
Index Cond: (orgid = 969677)
-> Index Scan using files_eid_idx on files f
(cost=0.00..2.82 rows=1 width=8) (actual time=0.005..0.005 rows=0
loops=185)
Index Cond: (f.eid = ev.eid)
Total runtime: 1.637 ms
I've played around with random_page_cost. All the other things you'd
expect, like effective_cache_size are set rather large (it's a server
with 32Gig ram and a 12 disk RAID-10) and no setting of
random_page_cost forces it to choose the non-mergejoin plan.
Anybody with any ideas, I'm all ears.
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2010-05-19 02:40:19 | Re: Stefan's bug (was: max_standby_delay considered harmful) |
Previous Message | Florian Pflug | 2010-05-19 00:15:42 | Re: Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle |
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew Wakeling | 2010-05-19 03:00:18 | Re: merge join killing performance |
Previous Message | Carlo Stonebanks | 2010-05-18 22:28:25 | Does FILTER in SEQSCAN short-circuit AND? |