Need help - optimizer trouble

From: Helge Bahmann <bahmann(at)math(dot)tu-freiberg(dot)de>
To: pgsql-novice(at)postgresql(dot)org
Subject: Need help - optimizer trouble
Date: 2001-04-04 22:01:39
Message-ID: Pine.LNX.4.21.0104042351270.18606-100000@lothlorien.stunet2.tu-freiberg.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hi!

I am desperate with 7.1beta4; I have a serious performance problem that
does not manifest in 7.0.2. This is currently just a feasability test, so
I am willing to try every hack.

The problematic part of my schema looks as follows:

document written_by author author_keyword
-------- 1 n ---------- ---------- --------------
*doc_id <---- *doc_id n 1 *name 1 n *keyword
doc *author_id ----> *author_id <---- *author_id

columns marked with an asterik have an index

document contains ~100 000 tuples; each is "written_by" 1-3 out of
~10 000 authors, and every author is associated with ~4 keywords

I am trying to retrieve documents written by authors associated with
a given keyword. I can get the doc_ids using:

SELECT doc_id FROM written_by AS wb
JOIN author_keyword AS kw ON kw.author_id=wb.author_id
WHERE kw.keyword='foo'

7.1beta4 and 7.0.2 both use the following query plan:

Nested Loop (cost=0.00..18.37 rows=8 width=20)
-> Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4)
-> Index Scan using written_by_idx on written_by (cost=0.00..16.29 rows=4 width=16)

I can then take the values returned and retrieve the documents in a
separate query; the two queries take at most a second. However if
I add a join to get the documents directly:

SELECT doc FROM document
JOIN written_by AS wb ON document.doc_id=wb.doc_id
JOIN author_keyword AS kw ON kw.author_id=wb.author_id
WHERE kw.keyword='foo'

7.0.2 will use the following query plan, query takes ~1 second:

Nested Loop (cost=0.00..27.53 rows=2 width=32)
-> Nested Loop (cost=0.00..10.00 rows=4 width=20)
-> Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4)
-> Index Scan using written_by_idx on ist_autor_von (cost=0.00..7.95 rows=2 width=16)
-> Index Scan using document_pkey on document (cost=0.00..4.16 rows=1 width=12)

7.1beta4 uses the following query plan, query takes ~150 seconds:

Nested Loop (cost=8562.09..39846.62 rows=4 width=32)
-> Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4)
-> Materialize (cost=37969.52..37969.52 rows=100000 width=28)
-> Hash Join (cost=8562.09..37969.52 rows=100000 width=28)
-> Seq Scan on document (cost=0.00..11932.00 rows=100000 width=12)
-> Hash (cost=5129.55..5129.55 rows=203555 width=16)
-> Seq Scan on written_by (cost=0.00..5129.55 rows=303555 width=16)

The two databases are not identical, but they were created using the
same script which fills the database with random data of the above
structure. I recreated the databases several times, so it is completely
repeatable.

I tried playing with the optimizer parameters, but it only got worse
because I do not really understand what they are all doing.

I fell uncomfortable with the idea of having to split up my query into
two separate ones. Can anyone give me any hints how to influence the 7.1
optimizer to behave like 7.0.2 in this case? Is there something wrong with
my query? Is there an alternative form which the optimizer can handle
better?

Thanks for your patience

Helge

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Neil Conway 2001-04-04 22:25:49 Re: Need help - optimizer trouble
Previous Message ADBAAMD 2001-04-04 18:17:12 Re: Postgresql.7.0.3