Re: When are index scans used over seq scans?

From: Richard van den Berg <richard(dot)vandenberg(at)trust-factory(dot)com>
To: John A Meinel <john(at)arbash-meinel(dot)com>
Cc: pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Re: When are index scans used over seq scans?
Date: 2005-04-21 12:14:26
Message-ID: 42679922.3040300@trust-factory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks a lot John for the correct search terms. :-)

The suggestion in
http://archives.postgresql.org/pgsql-performance/2005-04/msg00029.php to
add a constraint that checks (finishtime >= starttime) does not make a
difference for me. Still seq scans are used.

The width solution explained in
http://archives.postgresql.org/pgsql-performance/2005-04/msg00027.php
and
http://archives.postgresql.org/pgsql-performance/2005-04/msg00116.php
does make a huge difference when selecting 1 timestamp using a BETWEEN
(2ms vs 2sec), but as soon as I put 2 timestamps in a table and try a
join, everything goes south (7.7sec). I have 10k timestamps in the
duration table. :-(

I'm getting more confused on how the planner decides to use indexes. For
example, if I try:

explain analyze select us.oid from sessions us where '2005-04-10
23:11:00' between us.starttimetrunc and us.finishtimetrunc;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using sessions_st_ft_idx2 on sessions us
(cost=0.00..18320.73 rows=4765 width=4) (actual time=0.063..2.455
rows=279 loops=1)
Index Cond: (('2005-04-10 23:11:00'::timestamp without time zone <=
finishtimetrunc) AND ('2005-04-10 23:11:00'::timestamp without time zone
>= starttimetrunc))
Total runtime: 2.616 ms

is uses the index! However, if I change the date it does not:

explain analyze select us.oid from sessions us where '2005-04-09
23:11:00' between us.starttimetrunc and us.finishtimetrunc;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on sessions us (cost=0.00..68173.04 rows=41575 width=4)
(actual time=553.424..1981.695 rows=64 loops=1)
Filter: (('2005-04-09 23:11:00'::timestamp without time zone >=
starttimetrunc) AND ('2005-04-09 23:11:00'::timestamp without time zone
<= finishtimetrunc))
Total runtime: 1981.802 ms

The times in sessions go from '2005-04-04 00:00:00' to '2005-04-10
23:59:00' so both are valid times to query for, but April 10th is more
towards the end. A little experimenting shows that if I go earlier than
'2005-04-10 13:26:15' seq scans are being used. I was thinking this
timestamp would have something to do with the histogram_bounds in
pg_stats, but I cannot find a match:

starttimetrunc | {"2005-04-04 00:05:00","2005-04-04
11:49:00","2005-04-04 22:03:00","2005-04-05 10:54:00","2005-04-05
21:08:00","2005-04-06 10:28:00","2005-04-07 01:57:00","2005-04-07
15:55:00","2005-04-08 10:18:00","2005-04-08 17:12:00","2005-04-10 23:57:00"}
finishtimetrunc | {"2005-04-04 00:05:00.93","2005-04-04
11:53:00.989999","2005-04-04 22:35:00.38","2005-04-05
11:13:00.029999","2005-04-05 21:31:00.989999","2005-04-06
10:45:01","2005-04-07 02:08:08.25","2005-04-07 16:20:00.93","2005-04-08
10:25:00.409999","2005-04-08 17:15:00.949999","2005-04-11 02:08:19"}

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. | www.dna-portal.net
Bazarstraat 44a | www.trust-factory.com
2518AK The Hague | Phone: +31 70 3620684
The Netherlands | Fax : +31 70 3603009
-------------------------------------------
Have you visited our new DNA Portal?
-------------------------------------------

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Daniel Schuchardt 2005-04-21 12:19:06 Re: two queries and dual cpu (perplexed)
Previous Message Shoaib Burq (VPAC) 2005-04-21 11:49:53 two queries and dual cpu (perplexed)