Using LIMIT changes index used by planner

From: Sven Willenberger <sven(at)dmv(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Using LIMIT changes index used by planner
Date: 2004-12-13 06:13:43
Message-ID: 41BD3317.9090507@dmv.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a question regarding a serious performance hit taken when using a
LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB
of memory. The table in question contains some 25 million rows with a
bigserial primary key, orderdate index and a referrer index. The 2
select statements are as follow:

A) select storelocation,order_number from custacct where referrer = 1365
and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by
custacctid;

B) select storelocation,order_number from custacct where referrer = 1365
and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by
custacctid limit 10;

So the only difference is the use of the Limit, which, in theory, should
be quicker after custacctid is ordered.

Now the analyze results:

A) explain select storelocation,order_number from custacct where
referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07
12:00:00' order by custacctid;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=904420.55..904468.11 rows=19025 width=44)
Sort Key: custacctid
-> Index Scan using orderdate_idx on custacct
(cost=0.00..903068.29 rows=19025 width=44)
Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp
without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp
without time zone))
Filter: (referrer = 1365)
(5 rows)

************************

B) explain select storelocation,order_number from custacct where
referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07
12:00:00' order by custacctid limit 10;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..33796.50 rows=10 width=44)
-> Index Scan using custacct2_pkey on custacct
(cost=0.00..64297840.86 rows=19025 width=44)
Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07
12:00:00'::timestamp without time zone))
(3 rows)

*******************

Notice the huge cost difference in the two plans: 904468 in the one
without LIMIT versus 64297840.86 for the index scan on custacct index.
Why would the planner switch from using the orderdate index to the
custacct index (which is a BIGSERIAL, btw)?

I can change that behavior (and speed up the resultant query) by using
the following subquery:

explain select foo.storelocation, foo.order_number from (select
storelocation,order_number from custacct where referrer = 1365 and
orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by
custacctid) as foo limit 10;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=904420.55..904420.67 rows=10 width=100)
-> Subquery Scan foo (cost=904420.55..904658.36 rows=19025 width=100)
-> Sort (cost=904420.55..904468.11 rows=19025 width=44)
Sort Key: custacctid
-> Index Scan using orderdate_idx on custacct
(cost=0.00..903068.29 rows=19025 width=44)
Index Cond: ((orderdate >= '2004-12-07
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07
12:00:00'::timestamp without time zone))
Filter: (referrer = 1365)
(7 rows)

As a side note, when running query A, the query takes 1772.523 ms, when
running the subselect version to get the limit, it takes 1415.615 ms.
Running option B (with the other index being scanned) takes several
minutes (close to 10 minutes!). What am I missing about how the planner
views the LIMIT statement?

Sven

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew McMillan 2004-12-13 09:56:29 Re: Using LIMIT changes index used by planner
Previous Message Josh Berkus 2004-12-13 02:48:27 Re: Query is not using index when it should