Re: default index created for primary key

From: "Frank D(dot) Engel, Jr(dot)" <fde101(at)fjrhome(dot)net>
To: General PostgreSQL list <pgsql-general(at)postgresql(dot)org>
Subject: Re: default index created for primary key
Date: 2004-12-22 19:00:12
Message-ID: B510C398-544B-11D9-AF71-0050E410655F@fjrhome.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Dec 22, 2004, at 12:09 PM, vinita bansal wrote:
>
> I am still not clear on why postgres has this restriction?
> By uniqueness, you mean to say that if later anyone wants to add a
> primary key constraint on a table which already has a primary key
> defined, postgres will use this index to determine that there is
> already a primary key defined and would not allow to add this
> constraint since a table cannot have two primary keys??

No, an index is required for efficiency.

Consider a table with a column which must be unique. Assume there are
350,000 rows in the table. Now *every time* you insert a new row or
perform an update which changes that unique column, assuming no index
on the column, the database would need to check all 350,000 rows
individually to determine that the value is in fact unique.

With an index on the column, it is relatively quick for the database to
determine that the value is unique, as it does not need to check nearly
as many values..

To see this (rough example), start with an empty table with a single
column, which is a unique integer column. Now add values and watch
what happens to an index (use a fixed-width font):

4 53 72 15 23 19 3 12 8

Index:

4

4
\
53

53
/ \
4 72

53
/ \
4 72
\
15

53
/ \
15 72
/ \
4 23

19
/ \
15 53
/ \ \
4 23 72

15
/ \
/ \
/ \
4 53
/ \ / \
3 23 19 72

15
/ \
/ \
/ \
4 53
/ \ / \
3 23 19 72
\
12

15
/ \
/ \
/ \
4 53
/ \ / \
8 23 19 72
/ \
3 12

Now the user tries to insert 12, which is already in the table. In
order to determine that 12 is in the table, the database could scan
every value in the table until it finds it. In this case, it would
need to check 8 rows. Using the index, it would only need to check 4
values, cutting the time in half. In a few cases, it may take
marginally longer (2 as opposed to 1 for the value 4), but on average,
5 rows for unindexed vs. 2.8 rows for indexed, shows that the index has
a definite advantage.

Now extend this to 350,000 rows. Without an index, you'd need to check
an average of about 175,000 rows just to determine that a value was
there. And if the value is not there, as will more commonly be the
case, you'd need to check them all. With an index like the ones I used
above, you would need to check *at most* 19 values. You begin to see
why PostgreSQL requires an index here.

-----------------------------------------------------------
Frank D. Engel, Jr. <fde101(at)fjrhome(dot)net>

$ ln -s /usr/share/kjvbible /usr/manual
$ true | cat /usr/manual | grep "John 3:16"
John 3:16 For God so loved the world, that he gave his only begotten
Son, that whosoever believeth in him should not perish, but have
everlasting life.
$

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2004-12-22 19:02:30 Re: nice work on the new site
Previous Message Vivek Khera 2004-12-22 18:46:21 Re: What HW / OS is recommeded