Re: Using LIMIT changes index used by planner

From: Sven Willenberger <sven(at)dmv(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org, andrew(at)catalyst(dot)net(dot)nz
Subject: Re: Using LIMIT changes index used by planner
Date: 2004-12-14 18:28:52
Message-ID: 1103048932.5534.22.camel@lanshark.dmv.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 2004-12-13 at 17:43 -0500, Tom Lane wrote:
> Sven Willenberger <sven(at)dmv(dot)com> writes:
> > explain analyze select storelocation,order_number from custacct where
> > referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07
> > 12:00:00' order by custacctid limit 10;
>
> > QUERY PLAN
>
> > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=0.00..43065.76 rows=10 width=43) (actual
> > time=1306957.216..1307072.111 rows=10 loops=1)
> > -> Index Scan using custacct2_pkey on custacct
> > (cost=0.00..92083209.38 rows=21382 width=43) (actual
> > time=1306957.205..1307072.017 rows=10 loops=1)
> > Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07
> > 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07
> > 12:00:00'::timestamp without time zone))
> > Total runtime: 1307072.231 ms
> > (4 rows)
>
> I think this is the well-known issue of lack of cross-column correlation
> statistics. The planner is well aware that this indexscan will be
> horridly expensive if run to completion ---
> <snip>
> There isn't any near-term fix in the wind for this, since storing
> cross-column statistics is an expensive proposition that we haven't
> decided how to handle. Your workaround with separating the ORDER BY
> from the LIMIT is a good one.
>

You are correct in that there is a high degree of correlation between
the custacctid (which is a serial key) and the orderdate as the orders
generally get entered in the order that they arrive. I will go with the
workaround subselect query plan then.

On a related note, is there a way (other than set enable_seqscan=off) to
give a hint to the planner that it is cheaper to use and index scan
versus seq scan? Using the "workaround" query on any time period greater
than 12 hours results in the planner using a seq scan. Disabling the seq
scan and running the query on a full day period for example shows:

explain analyze select foo.storelocaion, foo.order_number from (select
storelocation,order_number from custacct where referrer = 1365 and
ordertdate between '2004-12-09' and '2004-12-10' order by custacctid) as
foo limit 10 offset 100;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2661326.22..2661326.35 rows=10 width=100) (actual
time=28446.605..28446.796 rows=10 loops=1)
-> Subquery Scan foo (cost=2661324.97..2661866.19 rows=43297
width=100) (actual time=28444.916..28446.298 rows=110 loops=1)
-> Sort (cost=2661324.97..2661433.22 rows=43297 width=41)
(actual time=28444.895..28445.334 rows=110 loops=1)
Sort Key: custacctid
-> Index Scan using orderdate_idx on custacct
(cost=0.00..2657990.68 rows=43297 width=41) (actual
time=4.432..28145.212 rows=44333 loops=1)
Index Cond: ((orderdate >= '2004-12-09
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-10
00:00:00'::timestamp without time zone))
Filter: (referrer = 1365)
Total runtime: 28456.893 ms
(8 rows)

If I interpret the above correctly, the planner guestimates a cost of
2661326 but the actual cost is much less (assuming time is equivalent to
cost). Would the set statistics command be of any benefit here in
"training" the planner?

Sven

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2004-12-14 19:35:17 Re: Using LIMIT changes index used by planner
Previous Message Markus Schaber 2004-12-14 18:20:06 Re: Anything to be gained from a 'Postgres Filesystem'?