Skip site navigation (1) Skip section navigation (2)

Postgresql 8.4 and Data warehousing

From: Dhaval Rami <dhavalrami(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Postgresql 8.4 and Data warehousing
Date: 2009-11-02 11:18:46
Message-ID: 18e902e00911020318m517baf0ep170bc321e674933b@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-admin
Data warehousing using postgresql is it possible ?

Answer is yes, it is possible.

Postgresql has always been considered as not suitable for data
warehousing because of advanced database features present in it, like
transaction support , locking, etc.

But the fact is: We cannot decide performance of a database just by
reading about it, best way is to practically implement the scenario,
tune it and compare it with other databases.

I did just that and following are the points one should consider to
achieve data warehouse in postgresql.

    Handling transactions and locks
        -Optimum transaction size
        -Parallelism to avoid locks
    PostgreSQL parameter configuration
        -Tweaking postgresql configuration parameters to best utilize
available hardware.
    Compartmentalize data storage
        -Making logically physical division of data on disk
    Views and fact tables
        -Frequency of create and update view and maintaining fact tables
    Column Index planning
        -Deciding use of index.
    Query planning
        -Planning a query based on requested criteria
    Hardware/Software properties that effect performance
        -RAM, HDD properties and softwares that can increase /
decrease system performance

Case study: Open source SIEM Cyberoam-iview
-> http://sourceforge.net/projects/cyberoam-iview/

Cyberoam-iView is a practically successful example of A Real time data
warehouse using unchanged postgresql 8.4 and files.

-->Features:
-Supports up to 60,000 Events/Sec. ( collect, parse, process, insert
into postgresql tables)
-1040 predefined reports (OLAP Aggregated cubes).
-Storage capacity: Aggregated cubes with granularity of 5 minutes for
1 week, cubes with granularity of 4 hours for 1 year, cubes with
granularity of 24 hours for 7 years. Fact table in compressed form.
-Response time < 30 seconds for most of the reports.
-Real time view update, Aggregation cubes updated every 5 min.
-Reports for any time range selection (up to 7 years) supported.

-->Limitations:
-Does not support ad hoc aggregated report generation from fact table,
All reports needs to be predefined at the time of deployment.
-Fact table need to be extracted and loaded before searching.
-Each cube is updated with top n significant aggregated records where
0 < n < 4001.

-->Minimum Hardware requirements :
Pentium P4 processor 3.0 G Hz
2 GB RAM
1 TB HDD , standard company with read write speed > 50 MBps

Regards
Dhaval Rami

pgsql-admin by date

Next:From: Shruthi ADate: 2009-11-02 13:26:47
Subject: Need help in enabling remote connection
Previous:From: Hengky LieDate: 2009-11-02 05:18:45
Subject: Help ! Service fails to start

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group