Re: Zedstore - compressed in-core columnar storage

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-04-10 10:13:07
Message-ID: CAA4eK1+=7ERXw1ZfUS8JQX-sUjWCA=wUAmYAMQWBB9q=TXKGrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 10, 2019 at 12:55 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 10/04/2019 09:29, Amit Kapila wrote:
> > On Tue, Apr 9, 2019 at 5:57 AM Ashwin Agrawal <aagrawal(at)pivotal(dot)io> wrote:
> >> Row store
> >> ---------
> >>
> >> The tuples are stored one after another, sorted by TID. For each
> >> tuple, we store its 48-bit TID, a undo record pointer, and the actual
> >> tuple data uncompressed.
> >>
> >
> > Storing undo record pointer with each tuple can take quite a lot of
> > space in cases where you can't compress them.
>
> Yeah. This does depend on compression to eliminate the unused fields
> quite heavily at the moment. But you could have a flag in the header to
> indicate "no undo pointer needed", and just leave it out, when it's needed.
>
> > Have you thought how will you implement the multi-locker scheme in
> > this design? In zheap, we have used undo for the same and it is easy
> > to imagine when you have separate transaction slots for each
> > transaction. I am not sure how will you implement the same here.
> I've been thinking that the undo record would store all the XIDs
> involved. So if there are multiple lockers, the UNDO record would store
> a list of XIDs.
>

This will be quite tricky. Whenever a new locker arrives, you first
need to fetch previous undo to see which all XIDs already have a lock
on it. Not only that, it will make discarding undo's way complicated.
We have considered this approach before implementing the current
approach in zheap.

> Alternatively, I suppose you could store multiple UNDO
> pointers for the same tuple.
>

This will not only make the length of the tuple unnecessarily long but
would make it much harder to reclaim that space once the corresponding
undo is discarded.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-04-10 11:09:22 Re: hyrax vs. RelationBuildPartitionDesc
Previous Message Peter Eisentraut 2019-04-10 09:50:10 Re: pgsql: Unified logging system for command-line programs