Skip site navigation (1) Skip section navigation (2)

database design question, new to postgresql

From: Fei Liu <fei(dot)liu(at)aepnetworks(dot)com>
To: PostgreSQL Novice <pgsql-novice(at)postgresql(dot)org>
Subject: database design question, new to postgresql
Date: 2007-04-27 20:03:19
Message-ID: 46325707.7070607@aepnetworks.com (view raw or flat)
Thread:
Lists: pgsql-novice
Hello group, I need to design and develop a web reporting system to let 
users query/view syslog files on a unix host. For now, I am 
concentrating on the authentication file that has user logon 
(success/failure) and logoff records. The log file is logrotated every 
week or so. My reporting system parses the log entries and put the 
result into a postgresql database (I am proposing to use postgresql as 
the backend). Since this deals with multi-year archive and I believe 
'partitioing' is an ideal feature to handle this problem. So here is the 
design scheme:

CREATE TABLE logon_success(
   name varchar(32) not null,
   srcip inet not null,
   date date not null,
   time time not null,
   ...
);


CREATE TABLE logon_success_yy${year}mm${month}(
   CHECK (date >= DATE '$year-$month-01' AND date < DATE 
'$next_year-$next_month-1')
)
INHERITS ($tname)
;

As you can see from the sample code, I am using perl to dynamically 
generate children tables as I parse log files in a daily cron job 
script. Once the log file is analyzed and archived in the database, I 
have a simple web UI that sysadmin can select and view user logon 
events. I have built a sample framework and it works so far. Keep in 
mind, this reporting system is not limited to just user logon, it should 
also work with system events such as services failures/startup, hardware 
failures, etc

Now here are my questions:
1) Should I use database to implement such a reporting system? Are there 
any alternatives, architects, designs?
2) Is partitioning a good approach to speed up log query/view? The user 
comment in partitioning in pgsql manual seems to indicate partitioning 
may be slower than non-partitioned table under certain circumstances.
3) How to avoid repetitive log entry scanning since my cron job script 
is run daily but logrotate runs weekly? This means everytime my script 
will be parsing duplicate entries.
4) When parsing log files, it's quite possible that there are identical 
entries, for example a user logins really fast, resulting 2 or more 
identical entries..In this case can I still use primary key/index at 
all? If I can, how do I design primary key or index to speed up query?
3) What are the most glaring limitations and flaws in my design?

Thank you for taking time to review and answer my questions! Let me know 
if I am not clear on any specific detail..

Fei

Responses

pgsql-novice by date

Next:From: Fei LiuDate: 2007-04-27 20:25:00
Subject: Re: database design question, new to postgresql
Previous:From: Bruno Wolff IIIDate: 2007-04-27 19:54:30
Subject: Re: removing duplicate entries in a <JOIN> statement

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group