Skip site navigation (1) Skip section navigation (2)

Using LIMIT changes index used by planner

From: Sven Willenberger <sven(at)dmv(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Using LIMIT changes index used by planner
Date: 2004-12-13 06:13:43
Message-ID: 41BD3317.9090507@dmv.com (view raw or flat)
Thread:
Lists: pgsql-performance
I have a question regarding a serious performance hit taken when using a 
LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB 
of memory. The table in question contains some 25 million rows with a 
bigserial primary key, orderdate index and a referrer index. The 2 
select statements are as follow:

A) select storelocation,order_number from custacct where referrer = 1365 
  and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by 
custacctid;

B) select storelocation,order_number from custacct where referrer = 1365 
  and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by 
custacctid limit 10;

So the only difference is the use of the Limit, which, in theory, should 
be quicker after custacctid is ordered.

Now the analyze results:

A) explain select storelocation,order_number from custacct where 
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07 
12:00:00' order by custacctid;
 
        QUERY PLAN 

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=904420.55..904468.11 rows=19025 width=44)
    Sort Key: custacctid
    ->  Index Scan using orderdate_idx on custacct 
(cost=0.00..903068.29 rows=19025 width=44)
          Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp 
without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp 
without time zone))
          Filter: (referrer = 1365)
(5 rows)

************************

B) explain select storelocation,order_number from custacct where 
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07 
12:00:00' order by custacctid limit 10;
 
                  QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..33796.50 rows=10 width=44)
    ->  Index Scan using custacct2_pkey on custacct 
(cost=0.00..64297840.86 rows=19025 width=44)
          Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07 
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 
12:00:00'::timestamp without time zone))
(3 rows)

*******************

Notice the huge cost difference in the two plans: 904468 in the one 
without LIMIT versus 64297840.86 for the index scan on custacct index. 
Why would the planner switch from using the orderdate index to the 
custacct index (which is a BIGSERIAL, btw)?

I can change that behavior (and speed up the resultant query) by using 
the following subquery:

explain select foo.storelocation, foo.order_number from (select 
storelocation,order_number from custacct where referrer = 1365  and 
orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by 
custacctid) as foo  limit 10;
 
              QUERY PLAN 

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=904420.55..904420.67 rows=10 width=100)
    ->  Subquery Scan foo  (cost=904420.55..904658.36 rows=19025 width=100)
          ->  Sort  (cost=904420.55..904468.11 rows=19025 width=44)
                Sort Key: custacctid
                ->  Index Scan using orderdate_idx on custacct 
(cost=0.00..903068.29 rows=19025 width=44)
                      Index Cond: ((orderdate >= '2004-12-07 
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 
12:00:00'::timestamp without time zone))
                      Filter: (referrer = 1365)
(7 rows)

As a side note, when running query A, the query takes 1772.523 ms, when 
running the subselect version to get the limit, it takes 1415.615 ms. 
Running option B (with the other index being scanned) takes several 
minutes (close to 10 minutes!). What am I missing about how the planner 
views the LIMIT statement?

Sven

Responses

pgsql-performance by date

Next:From: Andrew McMillanDate: 2004-12-13 09:56:29
Subject: Re: Using LIMIT changes index used by planner
Previous:From: Josh BerkusDate: 2004-12-13 02:48:27
Subject: Re: Query is not using index when it should

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group