[WIP]Vertical Clustered Index (columnar store extension)

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: [WIP]Vertical Clustered Index (columnar store extension)
Date: 2016-12-30 03:55:39
Message-ID: CAJrrPGfaC7WC9NK6PTTy6YN-NN+hCy8xOLAh2doYhVg5d6HsAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

Fujitsu was interested in developing a columnar storage extension with
minimal
changes the server backend.

The columnar store is implemented as an extension using index access
methods.
This can be easily enhanced with pluggable storage methods once they are
available.

A new index method (VCI) is added to create columnar index on the table.

The following is the basic design idea of the columnar extension,

This has the on-disk columnar representation. So, even after crash,
the columnar format is recovered to the state when it was crashed.

To provide performance benefit for both read and write operations,
the data is stored in two formats

1) write optimized storage (WOS)
2) read optimized storage (ROS).

This is useful for the users where there is a great chance of data
modification
that is newly added instead of the old data.

WOS
====

write optimized storage is the data of all columns that are part of VCI are
stored in a row wise format. All the newly added data is stored in WOS
relation with xmin/xmax information also. If user wants to update/delete the
newly added data, it doesn't affect the performance much compared to
deleting the data from columnar storage.

The tuples which don't have multiple copies or frozen data will be moved
from WOS to ROS periodically by the background worker process or autovauum
process. Every column data is stored separately in it's relation file. There
is no transaction information is present in ROS. The data in ROS can be
referred with tuple ID.

In this approach, the column data is present in both heap and columnar
storage.

ROS
====

This is the place, where all the column data is stored in columnar format.
The data from WOS to ROS is converted by background workers continously
based
on the tuple visibility check. Whenever the tuple is frozen and it gets
moved
from WOS to ROS.

The Data in ROS is stored in extents. One extent contains of 262,144 rows.
Because
of fixed number of records in an extent it is easy to map the heap record
to the columnar
record with TID to CRID map.

Insert
=====

The insert operation is just like inserting a data into an index.

Select
=====

Because of two storage formats, during the select operation, the data in WOS
is converted into Local ROS for the statement to be executed. The conversion
cost depends upon the number of tuples present in the WOS file. This
may add some performance overhead for select statements. The life of the
Local
ROS is till the end of query context.

Delete
=====

During the delete operation, whenever the data is deleted in heap at the
same
time the data in WOS file is marked as deleted similar like heap. But in
case
if the data is already migrated from WOS to ROS, then we will maintain some
delete vector to store the details of tuple id, transaction information and
etc.
During the data read from ROS file, it is verified against delete vector
and
confirms whether the record is visible or not? All the delete vectors
data is applied to ROS periodically.

More details of internal relations and their usage is available in the
README.
Still it needs more updates to explain full details of the columnar index
design.

The concept of Vertical clustered index columnar extension is from Fujitsu
Labs, Japan.

Following is the brief schedule of patches that are required
for a better performing columnar store.

1. Minimal server changes (new relkind "CSTORE" option)
2. Base storage patch
3. Support for moving data from WOS to ROS
4. Local ROS support
5. Custom scan support to read the data from ROS and Local ROS
6. Background worker support for data movement
7. Expression state support in VCI
8. Aggregation support in VCI
9. Pg_dump support for the new type of relations
10. psql \d command support for CSTORE relations
11. Parallelism support
12. Compression support
13. In-memory support with dynamic shared memory

Currently I attached only patches for 1 and 2. These will provide the
basic changes that are required in PostgreSQL core to the extension
to work.

I have to rebase/rewrite the rest of the patches to the latest master,
and share them with community.

Any Comments on the approach?

Regards,
Hari Babu
Fujitsu Australia

Attachment Content-Type Size
0002-Base-storage-patch.patch application/octet-stream 117.7 KB
0001-Server-minimal-changes.patch application/octet-stream 17.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-12-30 04:30:29 Re: Assignment of valid collation for SET operations on queries with UNKNOWN types.
Previous Message Tom Lane 2016-12-30 02:02:39 Re: gettimeofday is at the end of its usefulness?