RE: Zedstore - compressed in-core columnar storage

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Ashwin Agrawal' <aagrawal(at)pivotal(dot)io>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Zedstore - compressed in-core columnar storage
Date: 2019-07-01 02:59:17
Message-ID: 0A3221C70F24FB45833433255569204D1FC6B348@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Ashwin Agrawal [mailto:aagrawal(at)pivotal(dot)io]
> The objective is to gather feedback on design and approach to the same.
> The implementation has core basic pieces working but not close to complete.

Thank you for proposing a very interesting topic. Are you thinking of including this in PostgreSQL 13 if possible?

> * All Indexes supported
...
> work. Btree indexes can be created. Btree and bitmap index scans work.

Does Zedstore allow to create indexes of existing types on the table (btree, GIN, BRIN, etc.) and perform index scans (point query, range query, etc.)?

> * Hybrid row-column store, where some columns are stored together, and
> others separately. Provide flexibility of granularity on how to
> divide the columns. Columns accessed together can be stored
> together.
...
> This way of laying out the data also easily allows for hybrid row-column
> store, where some columns are stored together, and others have a dedicated
> B-tree. Need to have user facing syntax to allow specifying how to group
> the columns.
...
> Zedstore Table can be
> created using command:
>
> CREATE TABLE <name> (column listing) USING zedstore;

Are you aiming to enable Zedstore to be used for HTAP, i.e. the same table can be accessed simultaneously for both OLTP and analytics with the minimal performance impact on OLTP? (I got that impression from the word "hybrid".)
If yes, is the assumption that only a limited number of columns are to be stored in columnar format (for efficient scanning), and many other columns are to be stored in row format for efficient tuple access?
Are those row-formatted columns stored in the same file as the column-formatted columns, or in a separate file?

Regarding the column grouping, can I imagine HBase and Cassandra?
How could the current CREATE TABLE syntax support column grouping? (I guess CREATE TABLE needs a syntax for columnar store, and Zedstore need to be incorporated in core, not as an extension...)

> A column store uses the same structure but we have *multiple* B-trees, one
> for each column, all indexed by TID. The B-trees for all columns are stored
> in the same physical file.

Did you think that it's not a good idea to have a different file for each group of columns? Is that because we can't expect physical adjacency of data blocks on disk even if we separate a column in a separate file?

I thought a separate file for each group of columns would be easier and less error-prone to implement and debug. Adding and dropping the column group would also be very easy and fast.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2019-07-01 05:34:05 Re: Minimal logical decoding on standbys
Previous Message David Rowley 2019-07-01 02:14:47 Re: BUG #15383: Join Filter cost estimation problem in 10.5