Re: Zedstore - compressed in-core columnar storage

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-04-16 16:15:24
Message-ID: 20190416161524.bxqdduppb55yasbg@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 15, 2019 at 10:45:51PM -0700, Ashwin Agrawal wrote:
>On Mon, Apr 15, 2019 at 12:50 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
>> On Mon, Apr 15, 2019 at 9:16 AM Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
>> wrote:
>> > Would like to know more specifics on this Peter. We may be having
>> different context on hybrid row/column design.
>>
>> I'm confused about how close your idea of a TID is to the traditional
>> definition from heapam (and even zheap). If it's a purely logical
>> identifier, then why would it have two components like a TID? Is that
>> just a short-term convenience or something?
>>
>
>TID is purely logical identifier. Hence, stated in initial email that for
>Zedstore TID, block number and offset split carries no meaning at all. It's
>purely 48 bit integer entity assigned to datum of first column during
>insertion, based on where in BTree it gets inserted. Rest of the column
>datums are inserted using this assigned TID value. Just due to rest to
>system restrictions discussed by Heikki and Andres on table am thread poses
>limitations of value it can carry currently otherwise from zedstore design
>perspective it just integer number.
>

I'm not sure it's that clear cut, actually. Sure, it's not the usual
(block,item) pair so it's not possible to jump to the exact location, so
it's not the raw physical identifier as regular TID. But the data are
organized in a btree, with the TID as a key, so it does actually provide
some information about the location.

I've asked about BRIN indexes elsewhere in this thread, which I think is
related to this question, because that index type relies on TID providing
sufficient information about location. And I think BRIN indexes are going
to be rather important for colstores (and formats like ORC have something
very similar built-in).

But maybe all we'll have to do is define the ranges differently - instead
of "number of pages" we may define them as "number of rows" and it might
be working.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-16 16:44:34 Re: New vacuum option to do only freezing
Previous Message Fujii Masao 2019-04-16 16:15:21 Re: Speedup of relation deletes during recovery