postgresql does seqscan instead of using an existing index

From: "Jan Weerts" <j(dot)weerts(at)i-views(dot)de>
To: <pgsql-general(at)postgresql(dot)org>
Subject: postgresql does seqscan instead of using an existing index
Date: 2002-09-05 17:49:31
Message-ID: B349BABAF9A92F4D9FBFCADF8D5FEDD508108B@ivsrv03.i-views.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all!

I hope this is the right list for this question, if not, please
direct me to the appropriate one. This mail is rather longish, so
thanks in advance to all, who dare to read :-).

We are using PostgreSQL-7.1.2 (the upgrade is planned) to manage
the source code of a development team. We have read performance
issues, which increased over the time of the usage (and filling)
of this PostgreSQL database. Writing performance is acceptable.

Analyzing the typical query sequence, when loading a piece of
sourcecode, we found that a special query pattern consumes most
of the time. A sample query looks like this:

SELECT * FROM TW_ClassRecord
WHERE primaryKey IN (
SELECT metaclassRef FROM TW_PkgClasses
WHERE packageRef = 10047 AND metaclassRef NOT IN (
SELECT metaclassRef FROM TW_PkgClasses
WHERE packageRef = 10023
)
)

This query takes over 30 seconds to complete. There exist indexes
on TW_PkgClasses(packageRef) and naturally the primaryKey index on
TW_ClassRecord (both freshly vacuumed and analyzed).

an explain of the above query gives:
Seq Scan on tw_classrecord (cost=100000000.00..420280692.87
rows=310762 width=68)
SubPlan
-> Materialize (cost=1030.60..1030.60 rows=18 width=4)
-> Index Scan using tw_pkgclasses_packageref_index on
tw_pkgclasses (cost=0.00..1030.60 rows=18 width=4)
SubPlan
-> Materialize (cost=54.82..54.82 rows=18 width=4)
-> Index Scan using tw_pkgclasses_packageref_index on
tw_pkgclasses (cost=0.00..54.82 rows=18 width=4)

As you see, the planner does not want to use the index on the
primaryKey of tw_classrecord for the outermost part of this query.
Setting enable_seqscan to off had no effect on the planner.

Breaking the query in two parts, gives a much better performance
(much less than a second each) and uses the index:
SELECT metaclassRef FROM TW_PkgClasses
WHERE packageRef = 10047 AND metaclassRef NOT IN (
SELECT metaclassRef FROM TW_PkgClasses
WHERE packageRef = 10023
)
delivers 138 integers.

SELECT * FROM TW_ClassRecord
WHERE primaryKey IN ( <INSERT THE 138 INTEGERS HERE> )

explained, delivers:

Index Scan using tw_classrecord_pkey, tw_classrecord_pkey, .....
on tw_classrecord (cost=0.00..698.67 rows=14 width=68)

The used tables:
TW_PkgClasses 306692 rows
4 integer columns, one multicolumn index, where packageRef
is the first row
TW_ClassRecord 310762 rows
5 integers, 4 varchars, 3 single column indices on
the primarykey and two of the varchar fields

My question is: How do I convince PostgreSQL to use the index
when executing the original query? Any other solution would
also be welcomed (we already thought about changing the
development environment query mechanisms...).

Before I get the question: The DB is located on a 700MHz Linux 2.2 box
with a single fast IDE drive and 512MB RAM. A second PostgreSQL
database on the same machine carries much less data and read response
times are very good there. The machine itself is not loaded, all tests
were performed off-hours, meaning just a single user of the database.
I guess, that the machine is not the issue.

If you need any further data, which I did not provide here, please let
me know.

Thanks in advance
Jan

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2002-09-05 18:01:35 Re: Problem with restoring dump (may be tsearch-related)
Previous Message Nigel J. Andrews 2002-09-05 17:33:54 Re: "...integer[] references..." = error