Cost of sort/order by not estimated by the query planner

From: Laurent Laborde <kerdezixe(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Cost of sort/order by not estimated by the query planner
Date: 2009-11-30 16:54:03
Message-ID: 8a1bfe660911300854h63ffb39anfeb667da662cb37c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Friendly greetings !
I use postgresql 8.3.6.

here is a few info about the table i'm querying :
-------------------------------------------------------------
- select count(*) from _article : 17301610
- select count(*) from _article WHERE (_article.bitfield && getbit(0)) : 6729

Here are both request with problems :
--------------------------------------------------

QUERY 1 : Very fast !
-------------

explain SELECT *
FROM _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 500;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Limit (cost=66114.13..66115.38 rows=500 width=1114)
-> Sort (cost=66114.13..66157.37 rows=17296 width=1114)
Sort Key: id
-> Bitmap Heap Scan on _article (cost=138.32..65252.29
rows=17296 width=1114)
Recheck Cond: (bitfield && B'1'::bit varying)
-> Bitmap Index Scan on idx_article_bitfield
(cost=0.00..134.00 rows=17296 width=0)
Index Cond: (bitfield && B'1'::bit varying)

QUERY 2 : Endless ... (more than 30mn... i stopped the query)
-------------

explain SELECT *
FROM _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 5;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Limit (cost=0.00..2042.87 rows=5 width=1114)
-> Index Scan using _article_pkey on _article
(cost=0.00..7066684.46 rows=17296 width=1114)
Filter: (bitfield && B'1'::bit varying)
(3 rows)

With LIMIT 5 and LIMIT 500, the query plan are differents.
Postgresql estimate that it can do a a simple index scan to find only 5 row.
With more than LIMIT ~400 it estimate that it's faster to do a more
complex plan.
and it make sense !

The problem is in the order by, of course.
If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an
index scan.
At limit 500 (without order) it still use an index scan and it is
slightly slower.
At limit 5000 (without order) it switch to a Bitmap Index Scan +
Bitmap Heap Scan and it's slower but acceptable (5.275 ms)

Why, with the "QUERY 2", postgresql doesn't estimate the cost of the
Sort/ORDER BY ?
Of course, by ignoring the order, both query plan are right and the
choice for thoses differents plans totally make sense.

But... if the planner would be kind enough to considerate the cost of
the order by, it would certainly choose the Bitmap Index + Bitmap Heap
scan for the limit 5.
And not an index_scan pkey !

I have set the statistics to 1000 for _article.bitfield, just in case
(and ran a vacuum analyze), it doesn't change anything.

Is that a bug ? any Idea ?

Thank you :)

--
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-11-30 17:21:20 Re: enable-thread-safety defaults?
Previous Message Jaime Casanova 2009-11-30 16:53:06 Re: set the cost of an aggregate function

Browse pgsql-performance by date

  From Date Subject
Next Message Denis Lussier 2009-11-30 17:12:21 Re: Server Freezing
Previous Message Bruce Momjian 2009-11-30 16:40:11 Re: SSD + RAID