IMCS: In Memory Columnar Store for PostgreSQL

From: knizhnik <knizhnik(at)garret(dot)ru>
To: pgsql-announce(at)postgresql(dot)org
Subject: IMCS: In Memory Columnar Store for PostgreSQL
Date: 2014-01-02 16:48:24
Message-ID: 52C59858.9090500@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-announce pgsql-hackers

I want to announce implementation of In-Memory Columnar Store extension
for PostgreSQL.
Vertical representation of data is stored in PostgreSQL shared memory.
Various basic and sophisticated analytic operators are provided for
manipulation with timeseries.

GitHub repository: https://github.com/knizhnik/imcs/
Documentation: http://www.garret.ru/imcs/user_guide.html
Sources: http://www.garret.ru/imcs-1.02.tar.gz

Columnar store manager stores data tables as sections of columns of data
rather than as rows of data.
Most of traditional DBMS-es store data in rows ("horizontally"): all
record attributes are stored together.
Such approach allows to load the whole record using one read operation
which usually leads to better performance for OLTP
queries (which access or update single records). But OLAP queries are
mostly performing operations on individual columns,
for example calculating sum or average of some column. In this case
vertical data representation, when data for each column
is stored independently, is more efficient. There are several DBMS-es in
marker which are based on vertical model: Vertica,
SciDB,... Also most of mainstream commercial databases also provide OLAP
extensions based on vertical storage:
Blue Acceleration for DB2, Oracle Database In-Memory Option, Microsoft
SQL server column store...

Columnar store or vertical representation of data allows to achieve
better performance in comparison with classical horizontal
representation due to three factors:
* Reducing size of fetched data: only columns involved in query are
accessed.
* Vector operations. Applying an operator to set of values (tile) makes
it possible to minimize interpretation cost.
Also SIMD instructions of modern processors accelerate execution of
vector operations.
* Compression of data. Certainly compression can also be used for all
the records, but independent compression of each column can give much
better results without significant extra CPU overhead. For example such
simple compression algorithm like RLE
(run-length-encoding) allows not only to reduce used space, but also
minimize number of performed operations.

Responses

Browse pgsql-announce by date

  From Date Subject
Next Message David Fetter 2014-01-03 23:21:20 Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Previous Message David Fetter 2013-12-30 04:58:27 == PostgreSQL Weekly News - December 29 2013 ==

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-01-02 17:42:46 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Erik Rijkers 2014-01-02 16:33:30 Re: [PATCH] Negative Transition Aggregate Functions (WIP)