Re: [HACKERS] Solution for LIMIT cost estimation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: chris(at)bitmead(dot)com
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Solution for LIMIT cost estimation
Date: 2000-02-11 03:52:12
Message-ID: 18098.950241132@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Chris Bitmead <chrisb(at)nimrod(dot)itg(dot)telstra(dot)com(dot)au> writes:
> ... it sounds like this
> proposal could mean that successive SELECTS with LIMIT could
> execute completely different plans and therefore return inconsistent
> results.
> In short, I think the fact that limit doesn't alter the plan may
> be more of a feature than a bug.

A good point (one I'm embarrassed not to have thought about!) but
I don't think it's a reason not to push forward in this direction.
We have had *way* too many complaints about Postgres not being able
to optimize LIMITed queries, so I'm not willing to ignore the issue;
something's got to be done about it.

As Don B. points out nearby, there's no guarantee anyway that
repeatedly issuing the same query with different LIMIT parameters
will get you consistent results. The only upright and morally
correct approach is to use DECLARE CURSOR followed by FETCH commands
(all within a transaction of course) in order to get results that
are really all part of a single query. Now DECLARE CURSOR is also
presently optimized on the basis of fetching the entire result set,
so that is still a problem --- I neglected to mention before that
I was intending to tweak the optimizer to optimize that case like a
moderate-sized LIMIT.

But having said that, I hear what you're saying and I think it's
worth thinking about. Here are four possible alternative responses:

1. Optimize the query as I sketched previously, but using the "count"
part of the LIMIT spec while deliberately ignoring the "offset".
Then you get consistent results for fetching different chunks of the
query result as long as you use the same count each time. (And as long
as no one else is changing the DB meanwhile, but we'll take that as
read for each of these choices.)

2. Ignore both the count and offset, and optimize any query containing
a LIMIT clause on the basis of some fixed assumption about what fraction
of the tuples will be fetched (I'm thinking something like 1% to 10%).
This allows different fetch sizes to be used without destroying
consistency, but it falls down on the goal of correctly optimizing
"LIMIT 1" hacks.

3. Use the count+offset for all it's worth, and document that you
can't expect to get consistent results from asking for different
LIMITed chunks of the "same" query unless you use ORDER BY to
ensure consistent ordering of the tuples.

4. Fascist variant of #3: make LIMIT without ORDER BY be an error.

SQL92 does not define LIMIT at all, so it's not much help in
deciding what to do. Is there anything in SQL3? What do other
DBMSes do about this issue? Comments, other variants, better ideas
anyone?

> The other thing is, I would like at some stage to change limit so
> that it is attached to a SELECT rather than an entire query so
> you could...
> SELECT * from x where y in (SELECT y from z LIMIT 10) LIMIT 20;
> and I'm not sure how this would interact with that.

Since ORDER BY is only allowed at the top level by SQL92, there
would be no way for the user to ensure predictable results from
such a query. I think that'd be a dangerous path to go down.
However, if you had an answer that ensured consistent results from
queries with sub-LIMITs, I don't see that there'd be any problem
with allowing the optimizer to optimize 'em.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-02-11 04:31:36 Re: [HACKERS] Solution for LIMIT cost estimation
Previous Message Hiroshi Inoue 2000-02-11 03:20:40 RE: [HACKERS] Solution for LIMIT cost estimation