Re: On columnar storage

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On columnar storage
Date: 2015-06-12 04:34:58
Message-ID: CAA4eK1JVHSsdDZ7+AmZf_SdVCp_zsqVeHg7Zrq+w3wAnzx9NRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 12, 2015 at 4:33 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:
>
> We hope to have a chance to discuss this during the upcoming developer
> unconference in Ottawa. Here are some preliminary ideas to shed some
> light on what we're trying to do.
>
>
> I've been trying to figure out a plan to enable native column stores
> (CS or "colstore") for Postgres. Motivations:
>
> * avoid the 32 TB limit for tables
> * avoid the 1600 column limit for tables
> * increased performance
>
> There already are some third-party CS implementations for Postgres; some
> of these work on top of the FDW interface, others are simply proprietary
> forks. Since I don't have access to any of their code, it's not much I
> can learn from them. If people with insider knowledge on them can chime
> in, perhaps we can work together -- collaboration is very welcome.
>
> We're not interested in perpetuating the idea that a CS needs to go
> through the FDW mechanism. Even if there's a lot of simplicity of
> implementation, it's almost certain to introduce too many limitations.
>
> Simply switching all our code to use columnar storage rather than
> row-based storage is unlikely to go well. We're aiming at letting some
> columns of tables be part of a CS, while other parts would continue to
> be in the heap. At the same time, we're aiming at opening the way for
> different CS implementations instead of trying to provide a single
> one-size-fits-all one.
>
>
> There are several parts to this:
>
> 1. the CSM API
> 2. Cataloguing column stores
> 3. Query processing: rewriter, optimizer, executor
>

I think another important point is about the format of column stores, in
Page format used by index/heap and how are they organised?

>
> The Column Store Manager API
> ----------------------------
>
> Since we want to have pluggable implementations, we need to have a
> registry of store implementations. I propose we add a catalog
> pg_cstore_impl with OID, name, and a bunch of function references to
> "open" a store, "getvalue" from it, "getrows" (to which we pass a qual
> and get a bunch of tuple IDs back), "putvalue".
>
> This is in line with our procedural language support.
>
> One critical detail is what will be used to identify a heap row when
> talking to a CS implementation. There are two main possibilities:
>
> 1. use CTIDs
> 2. use some logical tuple identifier
>
> Using CTIDs is simpler. One disadvantage is that every UPDATE of a row
> needs to let the CS know about the new location of the tuple, so that
> the value is known associated with the new tuple location as well as the
> old. This needs to happen even if the value of the column itself is not
> changed.

Isn't this somewhat similar to index segment?
Will the column store obey snapshot model similar to current heap tuples,
if so will it derive the transaction information from heap tuple?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-06-12 04:48:57 Re: The purpose of the core team
Previous Message Fujii Masao 2015-06-12 03:49:11 Re: 9.5 release notes