Weird query plans for my queries, causing terrible performance.

From: Arjen van der Meijden <acm(at)tweakers(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Weird query plans for my queries, causing terrible performance.
Date: 2003-01-31 00:06:32
Message-ID: b1celn$37k$1@news.tudelft.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

To create a list of recent topics in a forum I use these queries in mysql:
SELECT TopicID, StatusID, UserID
FROM F_Topics
WHERE ForumID = 15 AND Lastmessage > '2002-08-01 23:27:03+02'
AND Deleted = false
ORDER BY Lastmessage DESC
LIMIT 1000

and

SELECT TopicID, StatusID, UserID
FROM F_Topics
WHERE StatusID IN(1) AND Deleted = false AND ForumID = 15

Where in the client-code some simple checks are done on the queryresults
(for the rights a user has), which would make the queries quite slow on
mysql.

And after that it simply selects the needed info from the database using
a in list:

SELECT F_Topics.Name, F_Topics.TopicID, etc
FROM
F_Topics
LEFT JOIN
F_Users AS L_Users ON (L_Users.UserID = F_Topics.LastposterID)
, F_Users
WHERE
TopicID IN (list of topicids gathered from the above queries)
AND
F_Users.UserID = F_Topics.UserID
AND
!F_Topics.Deleted
AND
F_Topics.ForumID=15

Together it takes around 60ms to finish in mysql. That isn't the nicest
query setup. So I tried to create a single query for this in postgresql.

The F_Topics table has some fields, including TopicID (primary key),
UserID (foreign key to F_Users.UserID), LastPosterID (nullable foreign
to F_Users.UserID), LastMessage (timestamp), ForumID, StatusID (being a
integer with just 3 different values) and deleted (boolean)

There are three indices:
f_topics_pkey (on the topicid)
f_topics_forum_lastmessage_deleted (forumid, lastmessage, deleted)
f_topics_forum_status_deleted (forumid, statusid, deleted)

and the only relevant index on the F_Users table is its primary key on
UserID.

The query:
SELECT F_Topics.TopicID,F_Topics.StatusID, UserID
FROM F_Topics
WHERE (StatusID IN(1) OR F_Topics.Lastmessage > '2002-08-01 23:27:03+02')
AND F_Topics.Deleted = false AND ForumID = 15

Results in the plan:

Seq Scan on f_topics (cost=0.00..20347.58 rows=59 width=12)
Filter: (((statusid = 1) OR (lastmessage > '2002-08-01
23:27:03'::timestamp without time zone)) AND (deleted = false) AND
(forumid = 15))
Which would take around 2 seconds to complete.

While the query:
SELECT TopicID, StatusID, UserID
FROM F_Topics
WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15)
OR
(Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID
= 15)

Results in the plan:

Index Scan using f_topics_forum_status_deleted,
f_topics_forum_lastmessage_deleted on f_topics (cost=0.00..241.87
rows=60 width=12)
Index Cond: (((forumid = 15) AND (statusid = 1) AND (deleted =
false)) OR ((forumid = 15) AND (lastmessage > '2002-08-01
23:27:03'::timestamp without time zone) AND (deleted = false)))
Filter: (((statusid = 1) AND (deleted = false) AND (forumid = 15))
OR ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)
AND (deleted = false) AND (forumid = 15)))
(3 rows)

Executing in around 9ms and exactly what I need.

Of coarse the query needs to be joined with the F_Users table.
My tries were:
SELECT TopicID, StatusID as StatusID, F_Users.UserID
FROM F_Topics, F_Users
WHERE (StatusID IN(1) AND Deleted = false AND ForumID = 15 AND
F_Topics.UserID = F_Users.UserID)
OR
(Lastmessage > '2002-08-01 23:27:03+02' AND Deleted = false AND ForumID
= 15 AND F_Topics.UserID = F_Users.UserID)

and

SELECT F_Topics.TopicID,F_Topics.StatusID as StatusID, F_Users.UserID
FROM F_Topics, F_Users
WHERE (
(StatusID IN(1) AND F_Topics.Deleted = false AND ForumID = 15)
OR
(F_Topics.Lastmessage > '2002-08-01 23:27:03+02' AND F_Topics.Deleted =
false AND ForumID = 15))
AND F_Topics.UserID = F_Users.UserID

(and a inner join'ed version of the above)

All three took 500-600ms to complete and changed the query plan to
something similar to:
Nested Loop (cost=0.00..23788.87 rows=1 width=16)
-> Index Scan using f_topics_forum_status_deleted on f_topics
(cost=0.00..23785.84 rows=1 width=12)
Index Cond: (forumid = 15)
Filter: (((lastmessage > '2002-08-01 23:27:03'::timestamp
without time zone) OR (statusid = 1)) AND ((deleted = false) OR
(statusid = 1)) AND ((forumid = 15) OR (statusid = 1)) AND ((lastmessage
> '2002-08-01 23:27:03'::timestamp without time zone) OR (deleted =
false)) AND (deleted = false) AND ((forumid = 15) OR (deleted = false))
AND ((lastmessage > '2002-08-01 23:27:03'::timestamp without time zone)
OR (forumid = 15)) AND ((deleted = false) OR (forumid = 15)))
-> Index Scan using f_users_pkey on f_users (cost=0.00..3.01
rows=1 width=4)
Index Cond: ("outer".userid = f_users.userid)
(6 rows)

I.e. It creates a much harder query plan to execute and terribly slows
down what should be much simpler and much faster.

By the way, I did do 'vacuum full analyze' before gathering these plans.
The result of the above queries is 134 rows from 549679 topicrows in
total. 3 rows being selected due to the status = 1 and 131 due to the
lastmessage time restriction.
In total there are 15943 topics in the 15th forum, in total 102 having a
statusid = 1, 5321 meeting the last message requirement and 15 topics
have the deleted boolean set to true.

I hope someone can shine a bright light on this, since I'm out of clues.
I also tries using subselects for the separate parts either union'ed
together or OR'ed together. Resulting in even worse plans (with
seq-scans for the f_users.userid for instance, or the f_topics.topicid)
of over 60 seconds execution time.

Best regards,

Arjen van der Meijden

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris White 2003-01-31 00:09:03 FW: Duplicate indexes found in the postgres Database
Previous Message Mitch 2003-01-31 00:00:38 Security Question