Re: [HACKERS] Solution for LIMIT cost estimation

From: Chris Bitmead <chrisb(at)nimrod(dot)itg(dot)telstra(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Solution for LIMIT cost estimation
Date: 2000-02-10 23:01:46
Message-ID: 38A3435A.B6212DB8@nimrod.itg.telecom.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


A couple of things occur to me. One is that it sounds like this
proposal could mean that successive SELECTS with LIMIT could
execute completely different plans and therefore return inconsistent
results. For example, let's say I have 26 customers a through z.
My first call to SELECT name from customer limit 3 might return...
a
b
c
and then my next SELECT name from customer limit 3, 3 might return
a
b
c
again, when I might expect d e f. Of course in this case I could SORT,
but unless name is a unique field that won't work. I could sort on oid,
but that is expensive if I don't really want it. I could use a cursor,
but web pages don't like to do that because you don't know how long you
may need to keep the cursor open.

In short, I think the fact that limit doesn't alter the plan may
be more of a feature than a bug.

The other thing is, I would like at some stage to change limit so
that it is attached to a SELECT rather than an entire query so
you could...
SELECT * from x where y in (SELECT y from z LIMIT 10) LIMIT 20;
and I'm not sure how this would interact with that.

Tom Lane wrote:
>
> We have discussed in the past the need for the optimizer to take LIMIT
> into account when choosing plans. Currently, since planning is done
> on the basis of total plan cost for retrieving all tuples, there's
> no way to prefer a "fast start" plan over "slow start". But that's
> what you want if there's a small LIMIT.
>
> Up to now I haven't seen a practical way to fix this, because the
> optimizer does its planning bottom-up (start with scan plans, then
> make joins, etc) and there's no easy way to know at the bottom of
> the pyramid whether you can expect to take advantage of a LIMIT that
> exists at the top. For example, if there's a SORT or GROUP step
> in between, you can't apply the LIMIT to the bottom level; but the
> bottom guys don't know whether there will be such a step.
>
> I have thought of a fairly clean way to attack this problem, which
> is to represent the cost of a plan in two parts instead of only one.
> Instead of just "cost", have "startup cost" and "cost per tuple".
> (Actually, it'd probably be easier to work with "startup cost" and
> "total cost if all tuples are retrieved", but that's a trivial
> representational detail; the point is that our cost model will now be
> of the form a*N+b when N tuples are retrieved.) It'd be pretty easy
> to produce plausible numbers for all the plan types we use. Then,
> the plan comparators would keep any plan that wins on either startup
> or total cost, rather than only considering total cost. Finally
> at the top level of planning, when there is a LIMIT the preferred
> plan would be selected by comparing a*LIMIT+b rather than total cost.
>
> I think I can get this done before beta, but before I go into hack-
> attack mode I wanted to run it up the flagpole and see if anyone
> has any complaints or better ideas.
>
> regards, tom lane
>
> ************

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Bitmead 2000-02-10 23:15:28 Re: AW: AW: [HACKERS] Another nasty cache problem
Previous Message Thomas Lockhart 2000-02-10 22:15:24 Re: [HACKERS] Re: ECPG documentation