Index selection on a large table

From: "Michael Richards" <michael(at)fastmail(dot)ca>
To: pgsql-sql(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Index selection on a large table
Date: 2000-07-25 23:03:38
Message-ID: 397E1CCA.000017.47769@frodo.searchcanada.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-sql

Hi.

I believe this to be a bug, but I am submitting it to the SQL list as
well in the case I overlooked something.

I'm running
Postgres 7.0.2 on FreeBSD 4.0-STABLE/Intel
compiled with [gcc version 2.95.2 19991024 (release)]

I've got this interesting problem where the query plan is not what I
expect. Every month we log details of users logging into their
webmail accounts. Every week before the database is vacuumed the
oldest week's entries are removed. The table can be quite large. Here
is the relevant parts of its structure:

Table "logins"
Attribute | Type | Modifier
-----------+-------------+----------
loginid | varchar(16) |
ip | varchar(15) |
[...]
Indices: logins_ip_idx,
logins_loginid_idx

The indexes are as follows:
Index "logins_ip_idx"
Attribute | Type
-----------+-------------
ip | varchar(15)
btree

Index "logins_loginid_idx"
Attribute | Type
-----------+-------------
loginid | varchar(16)
btree

Size of the table:
fastmail=> select count(*) from logins;
count
---------
1082564
(1 row)

Now here is a query plan from a selection using loginid:
explain select * from logins where loginid='michael';
NOTICE: QUERY PLAN:
Index Scan using logins_loginid_idx on logins (cost=0.00..500.57
rows=130 width=48)

As expected it uses the logins_loginid_idx to select the rows that
match loginid='michael';

Now I should note that I store the IP's as type varchar(15).
The following query yeilds the questionable query plan:

explain select * from logins where ip='38.30.141.44';
NOTICE: QUERY PLAN:
Seq Scan on logins (cost=0.00..25248.51 rows=11602 width=48)

This one decides to ignore the fact that IP is indexed (and the table
was vacuumed) and it does a slow-assed sequential scan through a
million or so rows to pick out (in this case 3 matching rows).

Just to be sure, I re-vacuumed the table and tried the questionable
query again with the same results. Here is the vacuum output in case
it helps:

NOTICE: --Relation logins--
NOTICE: Pages 11717: Changed 1, reaped 0, Empty 0, New 0; Tup
1082580: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 64, MaxLen
92; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU
2.28s/15.38u sec.
NOTICE: Index logins_ip_idx: Pages 4550; Tuples 1082580. CPU
0.78s/2.65u sec.
NOTICE: Index logins_loginid_idx: Pages 3881; Tuples 1082580. CPU
0.62s/2.67u sec.
VACUUM

-Michael

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2000-07-25 23:30:39 Re: [SQL] Index selection on a large table
Previous Message Tom Lane 2000-07-24 03:21:51 Re: Converting a timestamp to a date when it contains nulls.

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2000-07-25 23:05:27 Re: pg_dump
Previous Message sathya priya 2000-07-25 22:14:08 pg_dump