Re: The usual sequential scan, but with LIMIT !

From: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: The usual sequential scan, but with LIMIT !
Date: 2004-09-07 06:51:54
Message-ID: opsdx2os1ncq72hf@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance


OK, thanks a lot for your explanations. Knowing how the planner "thinks",
makes it pretty logical. Thank you.

Now another question...

I have a table of records representing forum posts with a primary key
(id), a topic_id, a timestamp, and other fields which I won't detail. I
want to split them into pages (like forums usually do), with N posts per
page. In that case :

SELECT * FROM table WHERE topic_id=... ORDER BY post_timestamp asc LIMIT
N OFFSET N*page;

Also it's almost the same to order by id rather than post_timestamp (id
being a serial).

SELECT * FROM table WHERE topic_id=... ORDER BY id asc LIMIT N OFFSET
N*page;

This query runs slower and slower as the OFFSET grows, which is a problem
because the most accessed page in a forum is the last one.

So, for the last page, I tried :
SELECT * FROM table WHERE topic_id=... ORDER BY id desc LIMIT N;
But this does not use the index at all (seq scan + sort + limit).

My solution is simple : build an index on (-id), or on (some
date)-post_timestamp, then :
SELECT * FROM table WHERE topic_id=... ORDER BY (-id) desc LIMIT N;

Then the last page is the fastest one, but it always shows N posts.
That's not a problem, so I guess I'll use that. I don't like forums which
show 1 post on the last page because the number of posts modulo N is 1.
I may store the number of posts in a forum (updated by a trigger) to
avoid costly COUNT queries to count the pages, so I could use ORDER BY id
for the first half of the pages, and ORDER BY (-id) for the rest, so it
will always be fastest.

I could even create a pages table to store the id of the first post on
that page and then :
SELECT * FROM table WHERE topic_id=... AND id>id_of_first_post_in_page
ORDER BY id asc LIMIT N;
then all pages would be aqually fast.

Or, I could cache the query results for all pages but the last one.

Finally, the question : having a multiple field btree, it is not harder
to scan it in "desc order" than in "asc order". So why does not Postgres
do it ? Here is a btree example :

topic_id id
1 1
1 10
2 2
2 5
2 17
3 4
3 6

suppose I SELECT WHERE topic_id=2 ORDER BY topic_id ASC,id ASC.
Postgres simply finds the first row with topic_id=2 and goes from there.

suppose I SELECT WHERE topic_id=2 ORDER BY topic_id ASC,id DESC.
Postgres does a seq scan, but it could think a bit more and start at
"first index node which has topic_id>2" (simple to find in a btree) then
go backwards in the index. This can ge beneralized to any combination of
(asc,desc).

I made some more experiments, and saw Postgres does an 'Index Scan' when
ORDER BY clauses are all ASC, and an 'Index Scan Backwards' when all ORDER
BY are DESC. However, it does not handle a combination of ASC and DESC?

What do you think of this ?

On Mon, 06 Sep 2004 12:40:41 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=
> <lists(at)boutiquenumerique(dot)com> writes:
>> Now, if I LIMIT the query to 10 rows, the index should be used all the
>> time, because it will always return few rows... well, it doesn't !
>
> Not at all. From the planner's point of view, the LIMIT is going to
> reduce the cost by about a factor of 10/1403, since the underlying plan
> step will only be run partway through. That's not going to change the
> decision about which underlying plan step is cheapest: 10/1403 of a
> cheaper plan is still always less than 10/1403 of a more expensive plan.
>
> Later, you note that LIMIT with ORDER BY does affect the plan choice
> --- that's because in that situation one plan alternative has a much
> higher startup cost than the other (namely the cost of a sort step).
> A small LIMIT can allow the fast-startup plan to be chosen even though
> it would be estimated to be the loser if run to completion.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Paramveer.Singh 2004-09-07 07:35:31 resilient transactions
Previous Message Valerie Schneider DSI/DEV 2004-09-07 06:46:45 Pb with ecpg and include file on PG 8.0.0

Browse pgsql-performance by date

  From Date Subject
Next Message G u i d o B a r o s i o 2004-09-07 10:41:07 TOAST tables, cannot truncate
Previous Message Tom Lane 2004-09-06 16:40:41 Re: The usual sequential scan, but with LIMIT !