Re: PostgreSQL Columnar Store for Analytic Workloads

From: Stefan Keller <sfkeller(at)gmail(dot)com>
To: Hadi Moshayedi <hadi(at)citusdata(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL Columnar Store for Analytic Workloads
Date: 2014-04-08 06:28:09
Message-ID: CAFcOn2_CUt8hkDyEH2tk=wY4EAP1ZiKkauySdYmitykR=VmiTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hadi

Do you think that cstore_fd*w* is also welll suited for storing and
retrieving linked data (RDF)?

-S.

2014-04-03 18:43 GMT+02:00 Hadi Moshayedi <hadi(at)citusdata(dot)com>:

> Dear Hackers,
>
> We at Citus Data have been developing a columnar store extension for
> PostgreSQL. Today we are excited to open source it under the Apache v2.0
> license.
>
> This columnar store extension uses the Optimized Row Columnar (ORC) format
> for its data layout, which improves upon the RCFile format developed at
> Facebook, and brings the following benefits:
>
> * Compression: Reduces in-memory and on-disk data size by 2-4x. Can be
> extended to support different codecs. We used the functions in
> pg_lzcompress.h for compression and decompression.
> * Column projections: Only reads column data relevant to the query.
> Improves performance for I/O bound queries.
> * Skip indexes: Stores min/max statistics for row groups, and uses them to
> skip over unrelated rows.
>
> We used the PostgreSQL FDW APIs to make this work. The extension doesn't
> implement the writable FDW API, but it uses the process utility hook to
> enable COPY command for the columnar tables.
>
> This extension uses PostgreSQL's internal data type representation to
> store data in the table, so this columnar store should support all data
> types that PostgreSQL supports.
>
> We tried the extension on TPC-H benchmark with 4GB scale factor on a
> m1.xlarge Amazon EC2 instance, and the query performance improved by 2x-3x
> compared to regular PostgreSQL table. Note that we flushed the page cache
> before each test to see the impact on disk I/O.
>
> When data is cached in memory, the performance of cstore_fdw tables were
> close to the performance of regular PostgreSQL tables.
>
> For more information, please visit:
> * our blog post:
> http://citusdata.com/blog/76-postgresql-columnar-store-for-analytics
> * our github page: https://github.com/citusdata/cstore_fdw
>
> Feedback from you is really appreciated.
>
> Thanks,
> -- Hadi
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Barwick 2014-04-08 06:39:25 Doc typo in "9.28. Event Trigger Functions"
Previous Message Tom Lane 2014-04-08 04:52:14 Re: GiST support for inet datatypes