Re: Optimize ORDER BY ... LIMIT

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize ORDER BY ... LIMIT
Date: 2006-09-15 21:06:16
Message-ID: 874pv8q1pz.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


mark(at)mark(dot)mielke(dot)cc writes:

> On Fri, Sep 15, 2006 at 08:22:50PM +0100, Gregory Stark wrote:
>
> I'm curious, as I may be such an offender. What alternatives exist?
>
> I think you mean the concept of showing a page of information that
> you can navigate forwards and backwards a page, or select a range.
>
> What alternatives to limit/offset exist? If there are thousands or
> more results, I have trouble with an idea that the entire results
> should be queried, and cached, displaying only a fraction.
>
> I think most or all of the times I do this, an index is available,
> so perhaps that is why I don't find this issue problematic?

If you have a unique index and instead of using OFFSET you pass along the last
key of the previous page then you can use a WHERE clause on the indexed column
to go straight to the correct page rather than using OFFSET.

So for example if you're displaying bank transactions sorted by transaction_id
you have the "next page" button pass along the "start_transaction_id=nnn"
where nnn is the last transaction_id of the previous page. Then on the next
page you do a query with "WHERE transaction_id > ?" and pass that column. You
still use ORDER BY transaction_id and LIMIT.

This has the upside that your query always takes the same amount of time.

Using OFFSET means later pages take longer, possibly much longer, than earlier
pages. Possibly so much longer that it causes enough i/o to bring down your
web server etc.

It does have lots of downsides as well. You cannot provide direct links to the
later pages aside from the next page. It's difficult to provide a proper
"previous page" button. etc. Early in the web's history these were reasonable
trade-offs but nowadays it's hard to convince people that their $bazillion
machine can't handle sorting a couple thousand records for each page view and
they should sacrifice the user experience for that. So I've pretty much given
up on that battle.

> For implementation, I think something simple is best:
[...]

You just described using an insertion sort. Even if I went with insertion sort
instead of heap sort I think it makes sense to do it inside tuplesort.

> If X+Y tuples would take up too much memory, this plan should be
> skipped, and the general routines used instead.

And that's a big part of why...

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2006-09-15 21:17:48 Re: [HACKERS] plpgsql, return can contains any expression
Previous Message Guillaume Smet 2006-09-15 21:01:55 Re: Release notes