Re: Questions about indexes?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Cc: rbradetich(at)uswest(dot)net
Subject: Re: Questions about indexes?
Date: 2003-02-18 17:37:20
Message-ID: 200302180937.21001.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Ryan,

I am crossing this discussion to the PGSQL-PERFORMANCE list, which is the
proper place for it. Anyone interested, please follow us there!

>>>Ryan Bradetich said:
> the table would look like:
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has expired password.
> 2 | Mon Feb 17 00:34:24 MST 2003 | f101 | file /foo has improper owner.
> etc...
>
> So I do not need the anomaly to be part of the index, I only need it to
>
> I agree with you, that I would not normally add the anomally to the
> index, except for the unique row requirement. Thinking about it now,
> maybe I should guarentee unique rows via a check constraint...
>
> Thanks for making me think about this in a different way!

First off, I'm not clear on why a duplicate anominaly would be necessarily
invalid, given the above. Not useful, certainly, but legitimate real data.

I realize that you have performance considerations to work with. However, I
must point out that your database design is not *at all* normalized ... and
that normalizing it might solve some of your indexing problems.

A normalized structure would look something like:

TABLE operations
id serial not null primary key,
host_id int not null,
timeoccurred timestamp not null default now(),
category varchar(5) not null,
constraint operations_unq unique (host_id, timeoccurred, category)

TABLE anominalies
id serial not null primary key,
operation_id int not null references operations(id) on delete cascade,
anominaly text

This structure would have two advantages for you:
1) the operations table would be *much* smaller than what you have now, as you
would not be repeating rows for each anominaly.
2) In neither table would you be including the anominaly text in an index ...
thus reducing index size tremendously.

As a warning, though: you may find that for insert speed the referential
integrity key is better maintained at the middleware layer. We've had some
complaints about slow FK enforcement in Postgres.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message afterz 2003-02-18 17:59:01 New Feature - Project
Previous Message Hannu Krosing 2003-02-18 17:08:50 Re: Detecting corrupted pages earlier

Browse pgsql-performance by date

  From Date Subject
Next Message Bruce Momjian 2003-02-18 19:08:16 Re: WAL replay logic (was Re: [PERFORM] Mount options for
Previous Message Manfred Koizar 2003-02-18 17:31:48 Re: cost and actual time