Skip site navigation (1) Skip section navigation (2)

LIMIT confuses the planner (again)

From: Kouber Saparev <kouber(at)saparev(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: LIMIT confuses the planner (again)
Date: 2009-09-28 08:43:00
Message-ID: 4AC07714.1080909@saparev.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hello,

I am using PostgreSQL 8.3.7 and I am experiencing an issue similar to 
the one I've already described some time ago:
http://archives.postgresql.org/pgsql-performance/2009-02/msg00261.php

Again, adding a LIMIT clause to a query, which is normally executing 
very fast thanks to an index, makes it perform slow, because the planner 
no longer uses the "correct" index.

I have the following table:

CREATE TABLE message (
   message_sid SERIAL PRIMARY KEY,
   from_profile_sid INT NOT NULL REFERENCES profile,
   to_profile_sid INT NOT NULL REFERENCES profile,
   sender_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
   receiver_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
   body TEXT,
   datetime TIMESTAMP NOT NULL DEFAULT NOW()
);


With the following conditional index:

CREATE INDEX message_to_profile_idx ON message (to_profile_sid) WHERE 
NOT receiver_has_deleted;


The query to obtain the list of received messages of a profile is simple 
and executes very fast, because of the index above:

db=# EXPLAIN ANALYZE SELECT
   *
FROM
   message
WHERE
   to_profile_sid = -1
AND
   NOT receiver_has_deleted
ORDER BY
   message_sid DESC;
                                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=11857.09..11866.19 rows=3640 width=277) (actual 
time=0.317..0.319 rows=15 loops=1)
    Sort Key: message_sid
    Sort Method:  quicksort  Memory: 32kB
    ->  Bitmap Heap Scan on message  (cost=106.44..11641.78 rows=3640 
width=277) (actual time=0.096..0.271 rows=15 loops=1)
          Recheck Cond: ((to_profile_sid = (-1)) AND (NOT 
receiver_has_deleted))
          ->  Bitmap Index Scan on message_to_profile_idx 
(cost=0.00..105.53 rows=3640 width=0) (actual time=0.056..0.056 rows=21 
loops=1)
                Index Cond: (to_profile_sid = (-1))
  Total runtime: 0.383 ms
(8 rows)


Adding a LIMIT clause to exactly the same query slows its execution more 
than 20'000 times:

db=# EXPLAIN ANALYZE SELECT
   *
FROM
   message
WHERE
   to_profile_sid = -1
AND
   NOT receiver_has_deleted
ORDER BY
   message_sid DESC LIMIT 20;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..6513.60 rows=20 width=277) (actual 
time=0.617..6576.539 rows=15 loops=1)
    ->  Index Scan Backward using message_pkey on message 
(cost=0.00..1185474.32 rows=3640 width=277) (actual time=0.617..6576.522 
rows=15 loops=1)
          Filter: ((NOT receiver_has_deleted) AND (to_profile_sid = (-1)))
  Total runtime: 6576.572 ms
(4 rows)


Just as I was advised in my recent post, I've already increased the 
statistics of both fields all the way till 1000, analyzed the table and 
reindexed the index:

ALTER TABLE message ALTER COLUMN to_profile_sid SET STATISTICS 1000;
ALTER TABLE message ALTER COLUMN receiver_has_deleted SET STATISTICS 1000;
ANALYZE message;
REINDEX index message_to_profile_idx;


This time, however, the steps above didn't affect the planner in any 
way, it still refuses to use the index "message_to_profile_idx" when a 
LIMIT is involved (for this particular value of to_profile_sid).

Here's some statistical data:

db=# SELECT COUNT(*) FROM message;
   count
---------
  1312213
(1 row)

db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1;
  count
-------
   5604
(1 row)

db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1 AND NOT 
receiver_has_deleted;
  count
-------
     15
(1 row)

db=# SELECT COUNT(DISTINCT to_profile_sid) FROM message;
  count
-------
   8596
(1 row)

db=# SELECT AVG(length) FROM (SELECT to_profile_sid, COUNT(*) AS length 
FROM message GROUP BY to_profile_sid) AS freq;
          avg
----------------------
  152.6540251279664960
(1 row)

db=# SELECT n_distinct FROM pg_stats WHERE tablename='message' AND 
attname='to_profile_sid';
  n_distinct
------------
        6277
(1 row)


Also, the value of -1 for "to_profile_sid" is second in the list of 
most_common_vals in pg_stats, but still I don't understand why a simple 
limit is blinding the planner for the "good" index. Any ideas?

Regards,
-- 
Kouber Saparev
http://kouber.saparev.com/

Responses

pgsql-performance by date

Next:From: Robert HaasDate: 2009-09-28 11:38:54
Subject: Re: LIMIT confuses the planner (again)
Previous:From: std pikDate: 2009-09-28 06:13:40
Subject: Postgres performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group