Re: On columnar storage

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On columnar storage
Date: 2015-06-14 15:22:49
Message-ID: 20150614152248.GF133018@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Thu, Jun 11, 2015 at 7:03 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
> > I've been trying to figure out a plan to enable native column stores
> > (CS or "colstore") for Postgres. Motivations:
> >
> > * avoid the 32 TB limit for tables
> > * avoid the 1600 column limit for tables
> > * increased performance
>
> To me, it feels like there are two different features here that would
> be better separated. First, there's the idea of having a table that
> gets auto-joined to other tables whenever you access it, so that the
> user sees one really wide table but really the data is segregated by
> column groups under the hood. That's a neat idea.

Thanks. (It also seems pretty tricky to implement.)

> Second, there's the idea of a way of storing tuples that is different
> from PostgreSQL's usual mechanism - i.e. a storage manager API. I
> understand your concerns about going through the FDW API so maybe
> that's not the right way to do it, but it seems to me that in the end
> you are going to end up with something awfully similar to that + a
> local relfilenode + WAL logging support. I'm not clear on why you
> want to make the column store API totally different from the FDW API;
> there may be a reason, but I don't immediately see it.

I just don't see that the FDW API is such a good fit for what I'm trying
to do. Anything using the FDW API needs to implement its own visibility
checking, for instance. I want to avoid that, because it's its own
complex problem. Also, it doesn't look like the FDW API supports things
that I want to do (neither my proposed LateColumnMaterialization nor my
proposed BitmapColumnScan). I would have to extend the FDW API, then
contort my stuff so that it fits in the existing API; then I will need
to make sure that existing FDWs are not broken by the changes I would
propose elsewhere. Round peg, square hole is all I see here. All in
all, this seems too much additional work, just to make to things that
are really addressing different problems go through the same code.

You're correct about "local WAL logging". We will need a solution to
that problem. I was hoping to defer that until we had something like
Alexander Korotkov's proposed pluggable WAL stuff.

> Each of these two features is independently useful. If you just had
> the first feature, you could use the existing table format as your
> columnar store. I'm sure it's possible to do a lot better in some
> cases, but there could easily be cases where that's a really big win,
> because the existing table format has far more sophisticated indexing
> capabilities than any columnar store is likely to have in an early
> version.

Yeah, sure, it's pretty likely that the first experimental colstore
implementation will just be based on existing infrastructure.

> The second capability, of course, opens up all sorts of
> interesting possibilities, like compressed read-only storage or
> index-organized tables. And it would also let you have an
> "all-columnar" table, similar to what Citus's cstore_fdw does, which
> doesn't seem like it would be supported in your design, and could be a
> pretty big win for certain kinds of tables.

Well, I would like to know about those use cases that Citus stuff is
good at, so that we can make sure they work reasonably under my proposed
design. Maybe I have to require vacuuming so that the all-visible bits
are set, so that a column scan can skip visibility checking for most of
the underlying heap tuples to produce a large aggregation report. That
seems pretty reasonable to me.

> BTW, I'm not sure if it's a good idea to call all of this stuff
> "cstore". The name is intuitive, but cstore_fdw has enough traction
> already that it may create some confusion.

I'm not thinking of calling anything user-visible with the name
"cstore". It's just a development term. Surely we're not reserving
names from what is used in third party code.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-06-14 15:29:49 pg_resetsysid
Previous Message Tom Lane 2015-06-14 15:21:35 Re: 9.5 release notes