Re: [HACKERS] Solution for LIMIT cost estimation

From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Chris <chris(at)bitmead(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Solution for LIMIT cost estimation
Date: 2000-02-14 13:17:55
Message-ID: 3.0.5.32.20000215001755.03435430@mail.rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 23:12 14/02/00 +1100, Chris wrote:
>
>How about this as a compromise:
>
>If you give an offset without an ORDER BY the offset
>is useless if this optimisation is in place. If you
>allowed the offset with the optimisation and no
>order by it would be encouraging broken behaviour.

Not that I would do this necessarily, but

select * from t where <stuff> offset 1 limit 1

is a valid way of checking for duplicates. I would hate to see the
optimization turned off in this case.

>So therefore it would be reasonable to optimise a
>limit,offset query with no order by as if there were
>no offset. This would give consistent results, albeit
>it may not choose the best plan. But at least it
>won't hurt anyone.
...etc

The problem with using a stateless connection is that you have no
transaction control, so can not control table contents between calls. eg.
if it contains:

f
-
abel tasman
charles sturt
ferdinand marcos

(spot the odd one out)

and do a 'select * from t order by f offset 0 limit 2', then someone adds
'bruce stringsteen' and you try to get the next two rows via 'select * from
t order by f offset 2 limit 2', you will get 'charles sturt' again, and
miss bruce.

Either you have to say that the database is almost never updated (ie. it's
just a copy of real data, used for web applications), in which case you can
add all sorts of fields for optimizing stateless calls (notably an ID
field), or you have to implement some kind of state preservation, and dump
ID's into a temporary table or use 'held' cursors, which is not really that
hard [Don't know if PG supports either, but you can 'fake' temporary tables
pretty easily].

I may have missed something in what you need, but someone else has already
mentioned using 'MOVE' within a cursor, and it still seems to me that
putting the optimizer through hoops to achieve the result is liable to be a
problem in the long term.

eg. The Dec/Rdb optimizer actually assesses it's strategy while it's
running. If the query is taking too long, or the estimates it used prove
too inaccurate, it may change strategy. If PG implemented such a thing,
then this whole approach to offset/limit would be blown away - a strategy
will change depending on the data retrieved. It would be a pity if this
sort of improvement in the optimizer were blocked because of problems
caused by breaking successive calls to offset/limit queries.

Maybe either 'held cursors' or 'local temporary tables' could be added to
the ToDo list for a future release.

As to documenting the behaviour, I suspect that any NOTICE should also say
'This behaviour may change in the future - don't rely on it unless you like
living dangerously'.

Just my 0.02c, but I don't like putting limits on an optimizer.

As an aside, and because I like bringing this thing up, stored query
strategies would solve the problem for selected queries; you could specify
the strategy to be used in all executions of a prticular query...maybe this
could go on the ToDo list? ;-}

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Reed 2000-02-14 14:03:34 patch for binding tuples from result set to user allocated memory
Previous Message Bruce Momjian 2000-02-14 12:32:40 Re: Postgres ODBC