Re: PostgreSQL, OLAP, and Large Clusters

From: Ondrej Ivanič <ondrej(dot)ivanic(at)gmail(dot)com>
To: Ryan Kelly <rpkelly22(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL, OLAP, and Large Clusters
Date: 2012-09-26 22:51:31
Message-ID: CAM6mieKG0XSeZ+A=jFGqGCPM9ZXSWQo+Ob6oL=cbXknZFN_ozQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

On 26 September 2012 21:50, Ryan Kelly <rpkelly22(at)gmail(dot)com> wrote:
> The size of our database is growing rather rapidly. We're concerned
> about how well Postgres will scale for OLAP-style queries over terabytes
> of data. Googling around doesn't yield great results for vanilla
> Postgres in this application, but generally links to other software like
> Greenplum, Netezza, and Aster Data (some of which are based off of
> Postgres). Too, there are solutions like Stado. But I'm concerned about
> the amount of effort to use such solutions and what we would have to
> give up feature-wise.

We were able to have 2TB of data (16 Cores, 96GB RAM, FC connected
SAN). Performance was good:
- up to 5sec for simple select with multiple where conditions (2 - 25)
order by any column (of 400 columns) and return top 10k
- up to 15sec for executing 5 queries (simultaneously) which return
top 20 combination of any two columns
- up to 25sec for executing 56 queries (using materialised aggregate tables)
- aggregation job ran every 15 minutes and completed under 2 minutes:
5mil rows -> aggregation -> 56 tables
- all queries can be executed over date range up to several months
(monthly partitioned tables, 6 months history)

but it was very hard to scale this solution. We have tried:
- FusionIO cards: 10 to 100 times better performance, but very hard to
expand storage capacity; Cooling/power issues
- AsterData: nice SQL-MR feature and analytics (decision trees,
frequent items, clustering, ...); No libpq support and you have to use
JDBC or selected ODBC manager
- Greenplum (winer): performance comparable to FusionIO (10 to 50
times); we were able to remove aggregation job (because of columnar
store model); easy to port from postgres but could be complicated if
you are heavy pgpsql user

At this time I would try:
- Postgres-XC
- Stado
- Cassandra + Solr + Hadoop (for example DataStax Enterprise)
- Birst (http://www.birst.com/)

PS. For any commercial solution is better to buy their appliance (if
you can afford it...).

--
Ondrej Ivanic
(ondrej(dot)ivanic(at)gmail(dot)com)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2012-09-26 23:07:37 Re: Odd Invalid type name error in postgresql 9.1
Previous Message Samuel Gendler 2012-09-26 22:42:09 Re: Inaccurate Explain Cost