Skip site navigation (1) Skip section navigation (2)

Cost of sort/order by not estimated by the query planner

From: Laurent Laborde <kerdezixe(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Cost of sort/order by not estimated by the query planner
Date: 2009-11-30 16:54:03
Message-ID: 8a1bfe660911300854h63ffb39anfeb667da662cb37c@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Friendly greetings !
I use postgresql 8.3.6.

here is a few info about the table i'm querying :
-------------------------------------------------------------
- select count(*) from _article : 17301610
- select count(*) from _article WHERE (_article.bitfield && getbit(0)) : 6729


Here are both request with problems :
--------------------------------------------------

QUERY 1 : Very fast !
-------------

explain SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 500;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Limit  (cost=66114.13..66115.38 rows=500 width=1114)
   ->  Sort  (cost=66114.13..66157.37 rows=17296 width=1114)
         Sort Key: id
         ->  Bitmap Heap Scan on _article  (cost=138.32..65252.29
rows=17296 width=1114)
               Recheck Cond: (bitfield && B'1'::bit varying)
               ->  Bitmap Index Scan on idx_article_bitfield
(cost=0.00..134.00 rows=17296 width=0)
                     Index Cond: (bitfield && B'1'::bit varying)




QUERY 2 : Endless ... (more than 30mn... i stopped the query)
-------------

explain SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 5;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2042.87 rows=5 width=1114)
   ->  Index Scan using _article_pkey on _article
(cost=0.00..7066684.46 rows=17296 width=1114)
         Filter: (bitfield && B'1'::bit varying)
(3 rows)


With LIMIT 5 and LIMIT 500, the query plan are differents.
Postgresql estimate that it can do a a simple index scan to find only 5 row.
With more than LIMIT ~400 it estimate that it's faster to do a more
complex plan.
and it make sense !

The problem is in the order by, of course.
If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an
index scan.
At limit 500 (without order) it still use an index scan and it is
slightly slower.
At limit 5000 (without order) it switch to a Bitmap Index Scan +
Bitmap Heap Scan and it's slower but acceptable (5.275 ms)

Why, with the "QUERY 2", postgresql doesn't estimate the cost of the
Sort/ORDER BY ?
Of course, by ignoring the order, both query plan are right and the
choice for thoses differents plans totally make sense.

But... if the planner would be kind enough to considerate the cost of
the order by, it would certainly choose the Bitmap Index + Bitmap Heap
scan for the limit 5.
And not an index_scan pkey !

I have set the statistics to 1000 for _article.bitfield, just in case
(and ran a vacuum analyze), it doesn't change anything.

Is that a bug ? any Idea ?

Thank you :)

-- 
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/

Responses

pgsql-performance by date

Next:From: Denis LussierDate: 2009-11-30 17:12:21
Subject: Re: Server Freezing
Previous:From: Bruce MomjianDate: 2009-11-30 16:40:11
Subject: Re: SSD + RAID

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2009-11-30 17:21:20
Subject: Re: enable-thread-safety defaults?
Previous:From: Jaime CasanovaDate: 2009-11-30 16:53:06
Subject: Re: set the cost of an aggregate function

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group