Postgresql 8.4 and Data warehousing

From: Dhaval Rami <dhavalrami(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Postgresql 8.4 and Data warehousing
Date: 2009-11-02 11:18:46
Message-ID: 18e902e00911020318m517baf0ep170bc321e674933b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Data warehousing using postgresql is it possible ?

Answer is yes, it is possible.

Postgresql has always been considered as not suitable for data
warehousing because of advanced database features present in it, like
transaction support , locking, etc.

But the fact is: We cannot decide performance of a database just by
reading about it, best way is to practically implement the scenario,
tune it and compare it with other databases.

I did just that and following are the points one should consider to
achieve data warehouse in postgresql.

Handling transactions and locks
-Optimum transaction size
-Parallelism to avoid locks
PostgreSQL parameter configuration
-Tweaking postgresql configuration parameters to best utilize
available hardware.
Compartmentalize data storage
-Making logically physical division of data on disk
Views and fact tables
-Frequency of create and update view and maintaining fact tables
Column Index planning
-Deciding use of index.
Query planning
-Planning a query based on requested criteria
Hardware/Software properties that effect performance
-RAM, HDD properties and softwares that can increase /
decrease system performance

Case study: Open source SIEM Cyberoam-iview
-> http://sourceforge.net/projects/cyberoam-iview/

Cyberoam-iView is a practically successful example of A Real time data
warehouse using unchanged postgresql 8.4 and files.

-->Features:
-Supports up to 60,000 Events/Sec. ( collect, parse, process, insert
into postgresql tables)
-1040 predefined reports (OLAP Aggregated cubes).
-Storage capacity: Aggregated cubes with granularity of 5 minutes for
1 week, cubes with granularity of 4 hours for 1 year, cubes with
granularity of 24 hours for 7 years. Fact table in compressed form.
-Response time < 30 seconds for most of the reports.
-Real time view update, Aggregation cubes updated every 5 min.
-Reports for any time range selection (up to 7 years) supported.

-->Limitations:
-Does not support ad hoc aggregated report generation from fact table,
All reports needs to be predefined at the time of deployment.
-Fact table need to be extracted and loaded before searching.
-Each cube is updated with top n significant aggregated records where
0 < n < 4001.

-->Minimum Hardware requirements :
Pentium P4 processor 3.0 G Hz
2 GB RAM
1 TB HDD , standard company with read write speed > 50 MBps

Regards
Dhaval Rami

Browse pgsql-admin by date

  From Date Subject
Next Message Shruthi A 2009-11-02 13:26:47 Need help in enabling remote connection
Previous Message Hengky Lie 2009-11-02 05:18:45 Help ! Service fails to start