Skip site navigation (1) Skip section navigation (2)

Re: database design question, new to postgresql

From: Fei Liu <fei(dot)liu(at)aepnetworks(dot)com>
To: Fei Liu <fei(dot)liu(at)aepnetworks(dot)com>
Cc: PostgreSQL Novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: database design question, new to postgresql
Date: 2007-04-27 20:51:00
Message-ID: 46326234.7000709@aepnetworks.com (view raw or flat)
Thread:
Lists: pgsql-novice
Fei Liu wrote:
> Fei Liu wrote:
>> Hello group, I need to design and develop a web reporting system to 
>> let users query/view syslog files on a unix host. For now, I am 
>> concentrating on the authentication file that has user logon 
>> (success/failure) and logoff records. The log file is logrotated 
>> every week or so. My reporting system parses the log entries and put 
>> the result into a postgresql database (I am proposing to use 
>> postgresql as the backend). Since this deals with multi-year archive 
>> and I believe 'partitioing' is an ideal feature to handle this 
>> problem. So here is the design scheme:
>>
>> CREATE TABLE logon_success(
>>   name varchar(32) not null,
>>   srcip inet not null,
>>   date date not null,
>>   time time not null,
>>   ...
>> );
>>
>>
>> CREATE TABLE logon_success_yy${year}mm${month}(
>>   CHECK (date >= DATE '$year-$month-01' AND date < DATE 
>> '$next_year-$next_month-1')
>> )
>> INHERITS ($tname)
>> ;
>>
>> As you can see from the sample code, I am using perl to dynamically 
>> generate children tables as I parse log files in a daily cron job 
>> script. Once the log file is analyzed and archived in the database, I 
>> have a simple web UI that sysadmin can select and view user logon 
>> events. I have built a sample framework and it works so far. Keep in 
>> mind, this reporting system is not limited to just user logon, it 
>> should also work with system events such as services 
>> failures/startup, hardware failures, etc
>>
>> Now here are my questions:
>> 1) Should I use database to implement such a reporting system? Are 
>> there any alternatives, architects, designs?
>> 2) Is partitioning a good approach to speed up log query/view? The 
>> user comment in partitioning in pgsql manual seems to indicate 
>> partitioning may be slower than non-partitioned table under certain 
>> circumstances.
>> 3) How to avoid repetitive log entry scanning since my cron job 
>> script is run daily but logrotate runs weekly? This means everytime 
>> my script will be parsing duplicate entries.
>> 4) When parsing log files, it's quite possible that there are 
>> identical entries, for example a user logins really fast, resulting 2 
>> or more identical entries..In this case can I still use primary 
>> key/index at all? If I can, how do I design primary key or index to 
>> speed up query?
>> 3) What are the most glaring limitations and flaws in my design?
>>
>> Thank you for taking time to review and answer my questions! Let me 
>> know if I am not clear on any specific detail..
>>
>> Fei
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
> Let me add one more question, what are the best approaches to analyze 
> postgresql query performance and how to improve postgresql query 
> performance?
> Fei
My initial testing has not shown any significant difference between a 
partitioning approach and a plain (all entries in master) database 
approach...
 2005-01-01 | 00:27:55 | firewood | ssh | Login Successful | None | 
local | user9819 | 192.168.1.31

My test was based on two artificial tables that has 1700 records per day 
from 2004-02-01 to 2007-04-27, around 2 million entries that are 
identical in both tables.
My test script:
echo Testing database $t1 time based
time psql -p 5583 netilla postgres << EOF
select count(date) from $t1 where date > '2005-03-01' and date < 
'2006-12-11';
\q
EOF

echo Testing database $t2 time based
time psql -p 5583 netilla postgres << EOF
select count(date) from $t2 where date > '2005-03-01' and date < 
'2006-12-11';
\q
EOF

Result:
./timing_test.sh
Testing database logon_test time based
  count
---------
 1121472
(1 row)

0.00user 0.00system 0:02.92elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+456minor)pagefaults 0swaps
Testing database logon_test2 time based
  count
---------
 1121472
(1 row)

0.00user 0.00system 0:02.52elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+456minor)pagefaults 0swaps

But the numbers are really not static and logon_test2 (with 
partitioning) sometimes behave worse than logon_test...
Fei

In response to

pgsql-novice by date

Next:From: Mark DeeringDate: 2007-04-28 03:27:25
Subject: need good editor for postgreSQL
Previous:From: Fei LiuDate: 2007-04-27 20:25:00
Subject: Re: database design question, new to postgresql

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group