The usual sequential scan, but with LIMIT !

From: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: The usual sequential scan, but with LIMIT !
Date: 2004-09-06 12:15:44
Message-ID: opsdwm0iujcq72hf@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance


Hello,

I have this table :
CREATE TABLE apparts
(
id SERIAL NOT NULL PRIMARY KEY,

price FLOAT NOT NULL,
surface INTEGER NOT NULL,
price_sq FLOAT NOT NULL,

rooms INTEGER NULL,
vente BOOL NOT NULL,
category TEXT NOT NULL,
zipcode INTEGER NOT NULL,
departement INTEGER NOT NULL
) WITHOUT OIDS;

There is a BTREE index on 'departement'.
The table fits in RAM.

When I want to SELECT according to my indexed field, postgres chooses a
sequential scan unless the number of rows to be returned is very, very
small :

apparts=> explain analyze select * from apparts where departement=42;
Seq Scan on apparts (cost=0.00..853.12 rows=1403 width=47) (actual
time=5.094..52.026 rows=1516 loops=1)
Filter: (departement = 42)
Total runtime: 52.634 ms

OK, it returns 1516 rows, so maybe the seq scan is right.

apparts=> SET enable_seqscan = 0;
apparts=> explain analyze select * from apparts where departement=42;
Index Scan using apparts_dept on apparts (cost=0.00..1514.59 rows=1403
width=47) (actual time=0.045..2.770 rows=1516 loops=1)
Index Cond: (departement = 42)
Total runtime: 3.404 ms

Um, 15 times faster...

Index scan is called only when there are few rows. With other values for
'departement' where there are few rows, the Index is used automatically.
This is logical, even if I should adjust the page costs. I wish I could
tell postgres "this table will fit in RAM and be accessed often, so for
this table, the page seek cost should be very low".

Everything is vacuum full analyze.

Now, if I LIMIT the query to 10 rows, the index should be used all the
time, because it will always return few rows... well, it doesn't !

apparts=> SET enable_seqscan = 1;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10;
Limit (cost=0.00..6.08 rows=10 width=47) (actual time=5.003..5.023
rows=10 loops=1)
-> Seq Scan on apparts (cost=0.00..853.12 rows=1403 width=47) (actual
time=4.998..5.013 rows=10 loops=1)
Filter: (departement = 42)
Total runtime: 5.107 ms

Now, let's try :

apparts=> SET enable_seqscan = 0;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10;
Limit (cost=0.00..10.80 rows=10 width=47) (actual time=0.047..0.072
rows=10 loops=1)
-> Index Scan using apparts_dept on apparts (cost=0.00..1514.59
rows=1403 width=47) (actual time=0.044..0.061 rows=10 loops=1)
Index Cond: (departement = 42)
Total runtime: 0.157 ms

So, by itself, Postgres will select a very bad query plan (32x slower) on
a query which would be executed very fast using indexes. If I use OFFSET
+ LIMIT, it only gets worse because the seq scan has to scan more rows :

apparts=> SET enable_seqscan = 1;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10 OFFSET 85;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Limit (cost=51.69..57.77 rows=10 width=47) (actual time=10.224..10.246
rows=10 loops=1)
-> Seq Scan on apparts (cost=0.00..853.12 rows=1403 width=47) (actual
time=5.254..10.200 rows=95 loops=1)
Filter: (departement = 42)
Total runtime: 10.326 ms

apparts=> SET enable_seqscan = 1;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10 OFFSET 1000;
Limit (cost=608.07..614.15 rows=10 width=47) (actual time=43.993..44.047
rows=10 loops=1)
-> Seq Scan on apparts (cost=0.00..853.12 rows=1403 width=47) (actual
time=5.328..43.791 rows=1010 loops=1)
Filter: (departement = 42)
Total runtime: 44.128 ms

apparts=> SET enable_seqscan = 0;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10 OFFSET 1000;
Limit (cost=1079.54..1090.33 rows=10 width=47) (actual time=2.147..2.170
rows=10 loops=1)
-> Index Scan using apparts_dept on apparts (cost=0.00..1514.59
rows=1403 width=47) (actual time=0.044..1.860 rows=1010 loops=1)
Index Cond: (departement = 42)
Total runtime: 2.259 ms

Why is it that way ? The planner should use the LIMIT values when
planning the query, should it not ?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pierre-Frédéric Caillaud 2004-09-06 12:27:06 Re: The usual sequential scan, but with LIMIT !
Previous Message Björn Voigt 2004-09-06 11:57:11 Clustering postgresql

Browse pgsql-performance by date

  From Date Subject
Next Message Pierre-Frédéric Caillaud 2004-09-06 12:27:06 Re: The usual sequential scan, but with LIMIT !
Previous Message Tom Lane 2004-09-06 02:18:05 Re: Tanking a server with shared memory