Re: Questions about indexes?

From: Daniel Kalchev <daniel(at)digsys(dot)bg>
To: Ryan Bradetich <rbradetich(at)uswest(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Questions about indexes?
Date: 2003-02-17 09:54:40
Message-ID: 200302170954.h1H9seY14536@dcave.digsys.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>Ryan Bradetich said:
> the table would look like:
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has expired password.
> 2 | Mon Feb 17 00:34:24 MST 2003 | f101 | file /foo has improper owner.
> etc...
>
> So I do not need the anomaly to be part of the index, I only need it to
>
> I agree with you, that I would not normally add the anomally to the
> index, except for the unique row requirement. Thinking about it now,
> maybe I should guarentee unique rows via a check constraint...
>
> Thanks for making me think about this in a different way!

(sorry this is a bit long)

Ryan,

I use somewhat similarly structured data (archived records of various events)
and when the database was setup (back when this baby was called postgres95), I
too used indexes on all possible fields.

My database consists of an 'operations' table, which holds for the last x days
period (example) and several tables with archived records (per month, or
per-year - see later, The operations table can have frequent updates, which
add new data. Data is never modified but often lookups are made. The archived
tables are generated once and forever from the operations table (possibly
merging in the future, but I haven't yet made my mind on this) - then access
is read-only, although sufficiently frequent.

What I found for the many years of operating this database on different
PostgreSQL versions and hardware is that indexes have considerable cost. :)
So does the need to not miss anything from the operations table (that is,
collect data from many places and have have it all it there).

I ended up with few only indexes on the operations table, because the
processes that fill it up do minimal lookups to see if data is already in the
table, if not do inserts. Then at regular intervals, the table is cleaned up -
that is, a process to remove the duplicate is run. This unfortunately costs
OIDs, but I found no other reasonable way to do the fast inserts. Perhaps the
best way is to create the table without OIDs (but wouldn't this still waste
OIDs?) use COPY and then clean afterwards?

The archived tables are generated, then cleaned up. Then, as Tom suggested
indexes are put on the archived tables, only for the fields that are used in
queries. Once the table is created, there is no way duplicated data will
exist, as it will not be inserted into. Therefore no need for UNIQUE index
enforcement.

If you need to have one large 'history' table, then perhaps you will just have
to do (slow :) selects for each record before each insert, or just insert the
data and then run the cleanup process.

Daniel

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Schmitz 2003-02-17 12:23:36 Re: Still a bug in the VACUUM ??? !!!
Previous Message Olleg Samoylov 2003-02-17 09:50:31 Re: function to return pg_user.usesysid