Skip site navigation (1) Skip section navigation (2)

Re: The usual sequential scan, but with LIMIT !

From: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: The usual sequential scan, but with LIMIT !
Date: 2004-09-07 06:51:54
Message-ID: opsdx2os1ncq72hf@musicbox (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-performance
	OK, thanks a lot for your explanations. Knowing how the planner "thinks",  
makes it pretty logical. Thank you.

	Now another question...

	I have a table of records representing forum posts with a primary key  
(id), a topic_id, a timestamp, and other fields which I won't detail. I  
want to split them into pages (like forums usually do), with N posts per  
page. In that case :

	SELECT * FROM table WHERE topic_id=... ORDER BY post_timestamp asc LIMIT  
N OFFSET N*page;

	Also it's almost the same to order by id rather than post_timestamp (id  
being a serial).

	SELECT * FROM table WHERE topic_id=... ORDER BY id asc LIMIT N OFFSET  
N*page;

	This query runs slower and slower as the OFFSET grows, which is a problem  
because the most accessed page in a forum is the last one.

	So, for the last page, I tried :
	SELECT * FROM table WHERE topic_id=... ORDER BY id desc LIMIT N;
	But this does not use the index at all (seq scan + sort + limit).

	My solution is simple : build an index on (-id), or on (some  
date)-post_timestamp, then :
	SELECT * FROM table WHERE topic_id=... ORDER BY (-id) desc LIMIT N;

	Then the last page is the fastest one, but it always shows N posts.  
That's not a problem, so I guess I'll use that. I don't like forums which  
show 1 post on the last page because the number of posts modulo N is 1.
	I may store the number of posts in a forum (updated by a trigger) to  
avoid costly COUNT queries to count the pages, so I could use ORDER BY id  
for the first half of the pages, and ORDER BY (-id) for the rest, so it  
will always be fastest.

	I could even create a pages table to store the id of the first post on  
that page and then :
	SELECT * FROM table WHERE topic_id=... AND id>id_of_first_post_in_page  
ORDER BY id asc LIMIT N;
	then all pages would be aqually fast.

	Or, I could cache the query results for all pages but the last one.

	Finally, the question : having a multiple field btree, it is not harder  
to scan it in "desc order" than in "asc order". So why does not Postgres  
do it ? Here is a btree example :

	topic_id	id
	1		1
	1		10
	2		2
	2		5
	2		17
	3		4
	3		6

	suppose I SELECT WHERE topic_id=2 ORDER BY topic_id ASC,id ASC.
	Postgres simply finds the first row with topic_id=2 and goes from there.
	
	suppose I SELECT WHERE topic_id=2 ORDER BY topic_id ASC,id DESC.
	Postgres does a seq scan, but it could think a bit more and start at  
"first index node which has topic_id>2" (simple to find in a btree) then  
go backwards in the index. This can ge beneralized to any combination of  
(asc,desc).

	I made some more experiments, and saw Postgres does an 'Index Scan' when  
ORDER BY clauses are all ASC, and an 'Index Scan Backwards' when all ORDER  
BY are DESC. However, it does not handle a combination of ASC and DESC?

	What do you think of this ?


On Mon, 06 Sep 2004 12:40:41 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=  
> <lists(at)boutiquenumerique(dot)com> writes:
>> Now, if I LIMIT the query to 10 rows, the index should be used all the
>> time, because it will always return few rows... well, it doesn't !
>
> Not at all.  From the planner's point of view, the LIMIT is going to
> reduce the cost by about a factor of 10/1403, since the underlying plan
> step will only be run partway through.  That's not going to change the
> decision about which underlying plan step is cheapest: 10/1403 of a
> cheaper plan is still always less than 10/1403 of a more expensive plan.
>
> Later, you note that LIMIT with ORDER BY does affect the plan choice
> --- that's because in that situation one plan alternative has a much
> higher startup cost than the other (namely the cost of a sort step).
> A small LIMIT can allow the fast-startup plan to be chosen even though
> it would be estimated to be the loser if run to completion.
>
> 			regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


In response to

Responses

pgsql-performance by date

Next:From: G u i d o B a r o s i oDate: 2004-09-07 10:41:07
Subject: TOAST tables, cannot truncate
Previous:From: Tom LaneDate: 2004-09-06 16:40:41
Subject: Re: The usual sequential scan, but with LIMIT !

pgsql-general by date

Next:From: Paramveer.SinghDate: 2004-09-07 07:35:31
Subject: resilient transactions
Previous:From: Valerie Schneider DSI/DEVDate: 2004-09-07 06:46:45
Subject: Pb with ecpg and include file on PG 8.0.0

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group