Re: Cost of sort/order by not estimated by the query planner

From: Laurent Laborde <kerdezixe(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Cost of sort/order by not estimated by the query planner
Date: 2009-12-02 13:01:55
Message-ID: 8a1bfe660912020501v38013544q8a2cb9048613d6e4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Wed, Dec 2, 2009 at 1:47 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Wed, Dec 2, 2009 at 11:13 AM, Laurent Laborde <kerdezixe(at)gmail(dot)com> wrote:
>>>                                           QUERY PLAN
>>> -------------------------------------------------------------------------------------------------
>>>  Limit  (cost=0.00..2042.87 rows=5 width=1114)
>>>   ->  Index Scan using _article_pkey on _article
>>> (cost=0.00..7066684.46 rows=17296 width=1114)
>>>         Filter: (bitfield && B'1'::bit varying)
>>
>
> Ah, I missed this the first time around. It's scanning _article_pkey
> here. Ie, it's scanning the table from the oldest to the newest
> article assuming that the values wihch satisfy that constraint are
> evenly distributed and it'll find five of them pretty quickly. In
> reality there's a correlation between this bit being set and the value
> of _article.id and all the ones with it set are towards the end.
> Postgres doesn't have any statistics on how multiple columns are
> related yet so it can't know this.
>
> If this is an important query you might try having an index on
> <bitfield,id> or a partial index on "id where bitfield && B'1' ". The
> latter sounds like what you really need

There is, indeed, a lot of tricks and hacks.
Maybe my question was too confusing.

The question is : why a limit 5 is much much slower than a limit 500 ?

The problem is in the "order by" and not "finding enough the data that
match the filter".
Even if it's not evenly distributed, the queries without "order by"
are much much faster, EVEN when using the "pkey query plan".

without "order by" using the bitmap -> fast
without "order by" using the pkey index -> fast
with "order by" using the bitmap -> fast
with "order by" using the pkey index -> slowwwwwwwwwwwww

--
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-12-02 13:01:58 Re: operator exclusion constraints
Previous Message Greg Stark 2009-12-02 12:47:31 Re: Cost of sort/order by not estimated by the query planner

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-12-02 13:17:41 Re: Cost of sort/order by not estimated by the query planner
Previous Message Greg Stark 2009-12-02 12:47:31 Re: Cost of sort/order by not estimated by the query planner