Skip site navigation (1) Skip section navigation (2)

Index selection on a large table

From: "Michael Richards" <michael(at)fastmail(dot)ca>
To: pgsql-sql(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Index selection on a large table
Date: 2000-07-25 23:03:38
Message-ID: 397E1CCA.000017.47769@frodo.searchcanada.ca (view raw or flat)
Thread:
Lists: pgsql-bugspgsql-sql
Hi.

I believe this to be a bug, but I am submitting it to the SQL list as 
well in the case I overlooked something.

I'm running 
Postgres 7.0.2 on FreeBSD 4.0-STABLE/Intel 
compiled with [gcc version 2.95.2 19991024 (release)] 

I've got this interesting problem where the query plan is not what I 
expect. Every month we log details of users logging into their 
webmail accounts. Every week before the database is vacuumed the 
oldest week's entries are removed. The table can be quite large. Here 
is the relevant parts of its structure:

           Table "logins"
 Attribute |    Type     | Modifier 
-----------+-------------+----------
 loginid   | varchar(16) | 
 ip        | varchar(15) | 
[...]
Indices: logins_ip_idx,
         logins_loginid_idx

The indexes are as follows:
  Index "logins_ip_idx"
 Attribute |    Type     
-----------+-------------
 ip        | varchar(15)
btree

Index "logins_loginid_idx"
 Attribute |    Type     
-----------+-------------
 loginid   | varchar(16)
btree

Size of the table:
fastmail=> select count(*) from logins;
  count  
---------
 1082564
(1 row)

Now here is a query plan from a selection using loginid:
explain select * from logins where loginid='michael';
NOTICE:  QUERY PLAN:
Index Scan using logins_loginid_idx on logins  (cost=0.00..500.57 
rows=130 width=48)

As expected it uses the logins_loginid_idx to select the rows that 
match loginid='michael';

Now I should note that I store the IP's as type varchar(15).
The following query yeilds the questionable query plan:

explain select * from logins where ip='38.30.141.44';
NOTICE:  QUERY PLAN:
Seq Scan on logins  (cost=0.00..25248.51 rows=11602 width=48)

This one decides to ignore the fact that IP is indexed (and the table 
was vacuumed) and it does a slow-assed sequential scan through a 
million or so rows to pick out (in this case 3 matching rows).

Just to be sure, I re-vacuumed the table and tried the questionable 
query again with the same results. Here is the vacuum output in case 
it helps:

NOTICE:  --Relation logins--
NOTICE:  Pages 11717: Changed 1, reaped 0, Empty 0, New 0; Tup 
1082580: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 64, MaxLen 
92; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 
2.28s/15.38u sec.
NOTICE:  Index logins_ip_idx: Pages 4550; Tuples 1082580. CPU 
0.78s/2.65u sec.
NOTICE:  Index logins_loginid_idx: Pages 3881; Tuples 1082580. CPU 
0.62s/2.67u sec.
VACUUM

-Michael

Responses

pgsql-bugs by date

Next:From: Tom LaneDate: 2000-07-25 23:30:39
Subject: Re: [SQL] Index selection on a large table
Previous:From: Tom LaneDate: 2000-07-24 03:21:51
Subject: Re: Converting a timestamp to a date when it contains nulls.

pgsql-sql by date

Next:From: Tom LaneDate: 2000-07-25 23:05:27
Subject: Re: pg_dump
Previous:From: sathya priyaDate: 2000-07-25 22:14:08
Subject: pg_dump

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group