Re: [SPAM?] Re: Optimize ORDER BY ... LIMIT

From: mark(at)mark(dot)mielke(dot)cc
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [SPAM?] Re: Optimize ORDER BY ... LIMIT
Date: 2006-09-16 00:30:58
Message-ID: 20060916003058.GA26439@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 15, 2006 at 10:06:16PM +0100, Gregory Stark wrote:
> > I'm curious, as I may be such an offender. What alternatives exist?
> > ...
> > What alternatives to limit/offset exist? If there are thousands or
> > more results, I have trouble with an idea that the entire results
> > should be queried, and cached, displaying only a fraction.

> If you have a unique index and instead of using OFFSET you pass
> along the last key of the previous page then you can use a WHERE
> clause on the indexed column to go straight to the correct page
> rather than using OFFSET. So for example if you're displaying bank
> transactions sorted by transaction_id you have the "next page"
> button pass along the "start_transaction_id=nnn" where nnn is the
> last transaction_id of the previous page. Then on the next page you
> do a query with "WHERE transaction_id > ?" and pass that column. You
> still use ORDER BY transaction_id and LIMIT.

I found benefits to doing things this way that were not related to
performance. If the number of items leading up to your page changes,
remembering the offset can result in listing a completely different
page than you intended when paging forward or backwards. On my pages,
I prefer to define one of the items as the item I am looking at, and
page seeking is always +/- 1 page from that item. This means that I
am pretty close to what you are suggesting - except - because I do
this for functional reasons, and not for performance reasons, I am
doing something worse.

I use COUNT(*) and WHERE as you describe above to map this identifier
to an offset, and then a second SELECT with LIMIT/OFFSET to describe
the object and the those that follow on the page.

According to your suggestion, I think this means I should track the
identifier with the last offset, displaying the offset to the user for
information purposes only, not using it for any queries, and then use
WHERE and LIMIT?

I tried this out. EXPLAIN ANALYZE tells me that for a random
offset=200, limit=20 case I tried, the simple change changes it from
index scanning 207 rows to find 7 rows, to index scanning 7 rows to
find 7 rows. Sweet. Unfortunately, the time to complete is unchanged
around 1.3+/-0.2 milliseconds. Looks like my system has bigger
bottlenecks. :-)

Thanks,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-09-16 01:08:49 Re: [COMMITTERS] pgsql: sslinfo contrib module - information
Previous Message Bruce Momjian 2006-09-16 00:29:34 Re: Emacs local vars at the tail of every file