Quick Links

Re: [HACKERS] Solution for LIMIT cost estimation

From:	"Ross J(dot) Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Solution for LIMIT cost estimation
Date:	2000-02-10 17:50:33
Message-ID:	20000210115033.B8742@rice.edu
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom -
This would fit in really well with some ideas I've been working on
with remote table access. Mariposa makes the design decision that
anything might be remote, and the optimizer doesn't get to know what
is and what isn't, so they run the single-site optimization, basically
assuming everything is local, then fragment the plan and ship parts off
for remote execution.

I've been thinking about ways to let the optimizer know about remote
tables, and the capabilites of the server serving the remote table.
I won't fill in the details here. I'm writing up a proposal for how
this will all work: I've got a telecon today to gauge administrative
support for developing this during work hours, which will _dramatically_
speed up it's implementation ;-)

It does seem to me, however, that the cost of going remote is best
described with the a*N+b model you're describing here: for remote tables,
b will be large, and dominate, unless you're pulling a _lot_ of tuples.

Suffice to say, sounds great to me.

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm(at)rice(dot)edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St., Houston, TX 77005

On Thu, Feb 10, 2000 at 11:48:15AM -0500, Tom Lane wrote:
> We have discussed in the past the need for the optimizer to take LIMIT
> into account when choosing plans. Currently, since planning is done
> on the basis of total plan cost for retrieving all tuples, there's
> no way to prefer a "fast start" plan over "slow start". But that's
> what you want if there's a small LIMIT.
>
> Up to now I haven't seen a practical way to fix this, because the
> optimizer does its planning bottom-up (start with scan plans, then
> make joins, etc) and there's no easy way to know at the bottom of
> the pyramid whether you can expect to take advantage of a LIMIT that
> exists at the top. For example, if there's a SORT or GROUP step
> in between, you can't apply the LIMIT to the bottom level; but the
> bottom guys don't know whether there will be such a step.
>
> I have thought of a fairly clean way to attack this problem, which
> is to represent the cost of a plan in two parts instead of only one.
> Instead of just "cost", have "startup cost" and "cost per tuple".
> (Actually, it'd probably be easier to work with "startup cost" and
> "total cost if all tuples are retrieved", but that's a trivial
> representational detail; the point is that our cost model will now be
> of the form a*N+b when N tuples are retrieved.) It'd be pretty easy
> to produce plausible numbers for all the plan types we use. Then,
> the plan comparators would keep any plan that wins on either startup
> or total cost, rather than only considering total cost. Finally
> at the top level of planning, when there is a LIMIT the preferred
> plan would be selected by comparing a*LIMIT+b rather than total cost.
>
> I think I can get this done before beta, but before I go into hack-
> attack mode I wanted to run it up the flagpole and see if anyone
> has any complaints or better ideas.
>
> regards, tom lane
>
> ************

In response to

Solution for LIMIT cost estimation at 2000-02-10 16:48:15 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Lamar Owen	2000-02-10 17:51:35	Re: [HACKERS] createdb default arguments
Previous Message	Peter Eisentraut	2000-02-10 17:23:36	Re: [HACKERS] psql and libpq fixes