Re: [WIP]Vertical Clustered Index (columnar store extension)

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension)
Date: 2017-01-08 03:01:29
Message-ID: fa4e46a1-d6ee-723d-c3ca-c381bb7d91e9@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/29/16 9:55 PM, Haribabu Kommi wrote:
> The tuples which don't have multiple copies or frozen data will be moved
> from WOS to ROS periodically by the background worker process or autovauum
> process. Every column data is stored separately in it's relation file. There
> is no transaction information is present in ROS. The data in ROS can be
> referred with tuple ID.

Would updates be handled via the delete mechanism you described then?

> In this approach, the column data is present in both heap and columnar
> storage.

ISTM one of the biggest reasons to prefer a column store over heap is to
ditch the 24 byte overhead, so I'm not sure how much of a win this is.

Another complication is that one of the big advantages of a CSTORE is
allowing analysis to be done efficiently on a column-by-column (as
opposed to row-by-row) basis. Does your patch by chance provide that?

Generally speaking, I do think the idea of adding support for this as an
"index" is a really good starting point, since that part of the system
is pluggable. It might be better to target getting only what needs to be
in core into core to begin with, allowing the other code to remain an
extension for now. I think there's a lot of things that will be
discovered as we start moving into column stores, and it'd be very
unfortunate to accidentally paint the core code into a corner somewhere.

As a side note, it's possible to get a lot of the benefits of a column
store by using arrays. I've done some experiments with that and got an
80-90% space reduction, and most queries saw improved performance as
well (there were a few cases that weren't better). The biggest advantage
to this approach is people could start using it today, on any recent
version of Postgres. That would be a great way to gain knowledge on what
users would want to see in a column store, something else I suspect we
need. It would also be far less code than what you or Alvaro are
proposing. When it comes to large changes that don't have crystal-clear
requirements, I think that's really important.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-01-08 03:01:51 Re: ICU integration
Previous Message Tom Lane 2017-01-08 02:53:57 Re: merging some features from plpgsql2 project