performace problem after VACUUM ANALYZE

From: Scott Cain <cain(at)cshl(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Cc: gmod schema <gmod-schema(at)lists(dot)sourceforge(dot)net>
Subject: performace problem after VACUUM ANALYZE
Date: 2003-02-14 16:44:00
Message-ID: 1045241040.1486.600.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I am going to do my best to describe this problem, but the description
may be quite long. Also, this is my first post to this list, so I miss
important details please let me know. All of the following analysis was
done on my P4 laptop with 0.5 Gig ram and postgresql 7.3 installed from
RPM for RedHat linux 7.3

I have a database with two large tables that I need to do a join on.
Here is the query that is causing the problems:

select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
from feature f, featureloc fl
where
fl.srcfeature_id = 1 and
((fl.strand=1 and fl.nbeg <= 393164 and fl.nend >= 390956) OR
(fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and
f.feature_id = fl.feature_id

The feature table has 3,685,287 rows and featureloc has 3,803,762 rows.
Here are all of the relevant indexes on these tables:

Index "feature_pkey"
Column | Type
------------+---------
feature_id | integer
unique btree (primary key)

Index "featureloc_idx1"
Column | Type
------------+---------
feature_id | integer
btree

Index "featureloc_idx2"
Column | Type
---------------+---------
srcfeature_id | integer
btree

Index "featureloc_src_strand_beg_end"
Column | Type
---------------+----------
srcfeature_id | integer
strand | smallint
nbeg | integer
nend | integer
btree

In a naive database (ie, no ANALYZE has ever been run), the above query
runs in about 3 seconds after the first time it has been run (due to
caching?). Here is the output of EXPLAIN on that query:

Unique (cost=75513.46..75513.48 rows=1 width=167)
-> Sort (cost=75513.46..75513.46 rows=1 width=167)
-> Nested Loop (cost=0.00..75513.45 rows=1 width=167)
-> Index Scan using featureloc_idx2 on featureloc fl
(cost=0.00..75508.43 rows=1 width=14)
-> Index Scan using feature_pkey on feature f
(cost=0.00..5.01 rows=1 width=153)

Notice that for featureloc it is using featureloc_idx2, which is the
index on srcfeature_id. Ideally, it would be using
featureloc_src_strand_beg_end, but this is not bad. In fact, if I drop
featureloc_idx2 (not something I can do in a production database), it
does use that index and cuts the query runtime in half.

Now comes the really bizarre part: if I run VACUUM ANALYZE, the
performance becomes truly awful. Instead of using an index for
featureloc, it now does a seq scan, causing the runtime to go up to
about 30 seconds. Here is the output of explain now:

Unique (cost=344377.70..344759.85 rows=2548 width=47)
-> Sort (cost=344377.70..344377.70 rows=25477 width=47)
-> Nested Loop (cost=0.00..342053.97 rows=25477 width=47)
-> Seq Scan on featureloc fl (cost=0.00..261709.31
rows=25477 width=14)
-> Index Scan using feature_pkey on feature f
(cost=0.00..3.14 rows=1 width=33)

If I try to force it to use an index by setting enable_seqscan=0, it
then uses featureloc_idx1, the index on feature_id. This has slightly
worse performance than the seq scan, at just over 30 seconds. Here is
the output of explain for this case:

Unique (cost=356513.46..356895.61 rows=2548 width=47)
-> Sort (cost=356513.46..356513.46 rows=25477 width=47)
-> Nested Loop (cost=0.00..354189.73 rows=25477 width=47)
-> Index Scan using featureloc_idx1 on featureloc fl
(cost=0.00..273845.08 rows=25477 width=14)
-> Index Scan using feature_pkey on feature f
(cost=0.00..3.14 rows=1 width=33)

Now, if I drop featureloc_idx1 (again, not something I can do in
production) and still disallow seq scans, it uses featureloc_idx2, as it
did with the naive database above, but the performance is much worse,
running about 24 seconds for the query. Here is the output of explain:

Unique (cost=1310195.21..1310577.36 rows=2548 width=47)
-> Sort (cost=1310195.21..1310195.21 rows=25477 width=47)
-> Nested Loop (cost=0.00..1307871.48 rows=25477 width=47)
-> Index Scan using featureloc_idx2 on featureloc fl
(cost=0.00..1227526.82 rows=25477 width=14)
-> Index Scan using feature_pkey on feature f
(cost=0.00..3.14 rows=1 width=33)

Finally, if I drop featureloc_idx2, it uses the right index, and the
runtime gets back to about 2 seconds, but the database is unusable for
other queries because I dropped the other indexes and disallowed seq
scans. Here is the output for explain in this case:

Unique (cost=1414516.98..1414899.13 rows=2548 width=47)
-> Sort (cost=1414516.98..1414516.98 rows=25477 width=47)
-> Nested Loop (cost=0.00..1412193.25 rows=25477 width=47)
-> Index Scan using featureloc_src_strand_beg_end on
featureloc fl (cost=0.00..1331848.60 rows=25477 width=14)
-> Index Scan using feature_pkey on feature f
(cost=0.00..3.14 rows=1 width=33)

Now the question: can anybody tell me why the query planner does such a
bad job figuring out how to run this query after I run VACUUM ANALYZE,
and can you give me any advice on making this arrangement work without
forcing postgres' hand by limiting its options until it gets it right?

Thank you very much for reading down to here,
Scott

--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain(at)cshl(dot)org
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-02-14 17:00:18 Re: performace problem after VACUUM ANALYZE
Previous Message johnnnnnn 2003-02-14 16:33:14 Re: Tuning scenarios (was Changing the default configuration)