Re: On columnar storage

From: Qingqing Zhou <zhouqq(dot)postgres(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On columnar storage
Date: 2015-06-11 23:58:00
Message-ID: CAJjS0u2Lh9ix9Ff7_gigXJEfC1+yPkoOdAbyzMFs+P3PQNiY+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 11, 2015 at 4:03 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> I've been trying to figure out a plan to enable native column stores
> (CS or "colstore") for Postgres. Motivations:
>
> * avoid the 32 TB limit for tables
> * avoid the 1600 column limit for tables
> * increased performance
>
And better compression ratio.

> We're not interested in perpetuating the idea that a CS needs to go
> through the FDW mechanism.
>
Agree. It is cleaner to add a ColumnScan node which does a scan
against a columnar table, and a possible ColumnIndexScan for an
indexed columnar table seek.

> Since we want to have pluggable implementations, we need to have a
> registry of store implementations.
>
If we do real native implementation, where columnar store sits on par
with heap, can give us arbitray flexibility to control performance and
transaction, without worrying about interface (you defined below)
compatibility.

> One critical detail is what will be used to identify a heap row when
> talking to a CS implementation. There are two main possibilities:
>
> 1. use CTIDs
> 2. use some logical tuple identifier
>
I like the concept of half row, half columnar table: this allows row
part good for select * and updates, and columnar part for other
purpose. Popular columnar-only table uses position alignment, which is
virtual (no storage), to associate each column value. CTIDs are still
needed but not for this purpose. An alternaive is:
1. Allow column groups, where several columns physically stored together;
2. Updates are handled by a separate row store table associated with
each columnar table.

> Query Processing
> ----------------
>
If we treat columnar storage as first class citizen as heap, we can
model after heap, which enables much natural change in parser,
rewriter, planner and executor.

Regards,
Qingqing

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-06-12 00:08:33 Re: The purpose of the core team
Previous Message Tomas Vondra 2015-06-11 23:29:21 Re: DBT-3 with SF=20 got failed