Re: On columnar storage

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On columnar storage
Date: 2015-06-12 13:22:42
Message-ID: CA+Tgmob3vd_hEE_jq25Ftar1cV4TKdyDA2NSL6PMumiR1Z0hUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 11, 2015 at 7:03 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> I've been trying to figure out a plan to enable native column stores
> (CS or "colstore") for Postgres. Motivations:
>
> * avoid the 32 TB limit for tables
> * avoid the 1600 column limit for tables
> * increased performance

To me, it feels like there are two different features here that would
be better separated. First, there's the idea of having a table that
gets auto-joined to other tables whenever you access it, so that the
user sees one really wide table but really the data is segregated by
column groups under the hood. That's a neat idea. Second, there's
the idea of a way of storing tuples that is different from
PostgreSQL's usual mechanism - i.e. a storage manager API. I
understand your concerns about going through the FDW API so maybe
that's not the right way to do it, but it seems to me that in the end
you are going to end up with something awfully similar to that + a
local relfilenode + WAL logging support. I'm not clear on why you
want to make the column store API totally different from the FDW API;
there may be a reason, but I don't immediately see it.

Each of these two features is independently useful. If you just had
the first feature, you could use the existing table format as your
columnar store. I'm sure it's possible to do a lot better in some
cases, but there could easily be cases where that's a really big win,
because the existing table format has far more sophisticated indexing
capabilities than any columnar store is likely to have in an early
version. The second capability, of course, opens up all sorts of
interesting possibilities, like compressed read-only storage or
index-organized tables. And it would also let you have an
"all-columnar" table, similar to what Citus's cstore_fdw does, which
doesn't seem like it would be supported in your design, and could be a
pretty big win for certain kinds of tables.

BTW, I'm not sure if it's a good idea to call all of this stuff
"cstore". The name is intuitive, but cstore_fdw has enough traction
already that it may create some confusion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-06-12 13:28:56 Re: Missing XLOG_DEBUG check in AdvanceXLInsertBuffer()?
Previous Message Michael Meskes 2015-06-12 13:01:26 Re: Collection of memory leaks for ECPG driver