Many to many join seems slow?

From: Drew Wilson <drewmwilson(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Many to many join seems slow?
Date: 2007-05-15 13:57:55
Message-ID: 49735678-25F5-48AA-B464-0864BAE56B55@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm trying to debug a query that gets all the French translations for
all US string values. Ultimately, my goal is to rank them all by edit
distance, and only pick the top N.

However, I cannot get the basic many-to-many join to return all the
results in less than 3 seconds, which seems slow to me. (My
competition is an in-memory perl hash that runs on client machines
providing results in around 3 seconds, after a 30 second startup time.)

The simplified schema is :
source ->> translation_pair <<- translation

The keys are all sequence generated oids. I do wonder if the
performance would be better if I used the string values as keys to
get better data distribution. Would this help speed up performance?

There are 159283 rows in source
There are 1723935 rows in translation, of which 159686 are French

=# explain SELECT s.source_id, s.value AS sourceValue, t.value AS
translationValue
FROM
source s,
translation_pair tp,
translation t,
language l
WHERE
s.source_id = tp.source_id
AND tp.translation_id = t.translation_id
AND t.language_id = l.language_id
AND l.name = 'French' ;

QUERY PLAN
------------------------------------------------------------------------
-----------------------------------------------------
Merge Join (cost=524224.49..732216.29 rows=92447 width=97)
Merge Cond: (tp.source_id = s.source_id)
-> Sort (cost=524224.49..524455.60 rows=92447 width=55)
Sort Key: tp.source_id
-> Nested Loop (cost=1794.69..516599.30 rows=92447 width=55)
-> Nested Loop (cost=1794.69..27087.87 rows=86197
width=55)
-> Index Scan using language_name_key on
"language" l (cost=0.00..8.27 rows=1 width=4)
Index Cond: ((name)::text = 'French'::text)
-> Bitmap Heap Scan on translation t
(cost=1794.69..25882.43 rows=95774 width=59)
Recheck Cond: (t.language_id =
l.language_id)
-> Bitmap Index Scan on
translation_language_l_key (cost=0.00..1770.74 rows=95774 width=0)
Index Cond: (t.language_id =
l.language_id)
-> Index Scan using translation_pair_translation_id
on translation_pair tp (cost=0.00..5.67 rows=1 width=8)
Index Cond: (tp.translation_id = t.translation_id)
-> Index Scan using source_pkey on source s
(cost=0.00..206227.65 rows=159283 width=46)
(15 rows)

I'm running Postgres 8.2.3 on latest Mac OSX 10.4.x. The CPU is a
3Ghz Dual-Core Intel Xeon, w/ 5G ram. The drive is very fast although
I don't know the configuration (I think its an XRaid w/ 3 SAS/SCSI
70G Seagate drives).

The regular performance configurable values are:
work_mem 32MB
shared_buffers 32MB
max_fsm_pages 204800
max_fsm_relations 1000

Thanks for any advice,

Drew

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2007-05-15 14:05:06 Re: Many to many join seems slow?
Previous Message Heikki Linnakangas 2007-05-15 12:45:03 Re: Automatic adjustment of bgwriter_lru_maxpages