Quick Links

Re: PostgreSQL, OLAP, and Large Clusters

From:	Ondrej Ivanič <ondrej(dot)ivanic(at)gmail(dot)com>
To:	Ryan Kelly <rpkelly22(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL, OLAP, and Large Clusters
Date:	2012-09-26 22:51:31
Message-ID:	CAM6mieKG0XSeZ+A=jFGqGCPM9ZXSWQo+Ob6oL=cbXknZFN_ozQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

On 26 September 2012 21:50, Ryan Kelly <rpkelly22(at)gmail(dot)com> wrote:
> The size of our database is growing rather rapidly. We're concerned
> about how well Postgres will scale for OLAP-style queries over terabytes
> of data. Googling around doesn't yield great results for vanilla
> Postgres in this application, but generally links to other software like
> Greenplum, Netezza, and Aster Data (some of which are based off of
> Postgres). Too, there are solutions like Stado. But I'm concerned about
> the amount of effort to use such solutions and what we would have to
> give up feature-wise.

We were able to have 2TB of data (16 Cores, 96GB RAM, FC connected
SAN). Performance was good:
- up to 5sec for simple select with multiple where conditions (2 - 25)
order by any column (of 400 columns) and return top 10k
- up to 15sec for executing 5 queries (simultaneously) which return
top 20 combination of any two columns
- up to 25sec for executing 56 queries (using materialised aggregate tables)
- aggregation job ran every 15 minutes and completed under 2 minutes:
5mil rows -> aggregation -> 56 tables
- all queries can be executed over date range up to several months
(monthly partitioned tables, 6 months history)

but it was very hard to scale this solution. We have tried:
- FusionIO cards: 10 to 100 times better performance, but very hard to
expand storage capacity; Cooling/power issues
- AsterData: nice SQL-MR feature and analytics (decision trees,
frequent items, clustering, ...); No libpq support and you have to use
JDBC or selected ODBC manager
- Greenplum (winer): performance comparable to FusionIO (10 to 50
times); we were able to remove aggregation job (because of columnar
store model); easy to port from postgres but could be complicated if
you are heavy pgpsql user

At this time I would try:
- Postgres-XC
- Stado
- Cassandra + Solr + Hadoop (for example DataStax Enterprise)
- Birst (http://www.birst.com/)

PS. For any commercial solution is better to buy their appliance (if
you can afford it...).

--
Ondrej Ivanic
(ondrej(dot)ivanic(at)gmail(dot)com)

In response to

PostgreSQL, OLAP, and Large Clusters at 2012-09-26 11:50:44 from Ryan Kelly

Responses

Re: PostgreSQL, OLAP, and Large Clusters at 2012-09-27 18:34:41 from Ryan Kelly

Browse pgsql-general by date

	From	Date	Subject
Next Message	Adrian Klaver	2012-09-26 23:07:37	Re: Odd Invalid type name error in postgresql 9.1
Previous Message	Samuel Gendler	2012-09-26 22:42:09	Re: Inaccurate Explain Cost