Re: Zedstore - compressed in-core columnar storage

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, DEV_OPS <devops(at)ww-it(dot)cn>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-08-26 12:35:57
Message-ID: CAE9k0PmibnFqQyHq8ERE=pyAsBb0n+4tVUHtGRiU97_jZOh+5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Ashwin and Heikki for your responses. I've one more query here,

If BTree index is created on a zedstore table, the t_tid field of
Index tuple contains the physical tid that is not actually pointing to
the data block instead it contains something from which the logical
tid can be derived. So, when IndexScan is performed on a zedstore
table, it fetches the physical tid from the index page and derives the
logical tid out of it and then retrieves the data corresponding to
this logical tid from the zedstore table. For that, it kind of
performs SeqScan on the zedstore table for the given tid. From this it
appears to me as if the Index Scan is as good as SeqScan for zedstore
table. If that is true, will we be able to get the benefit of
IndexScan on zedstore tables? Please let me know if i am missing
something here.

AFAIU, the following user level query on zedstore table

select * from zed_tab where a = 3;

gets internally converted to

select * from zed_tab where tid = 3; -- assuming that index is created
on column 'a' and the logical tid associated with a = 3 is 3.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Thu, Aug 15, 2019 at 3:08 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 14/08/2019 20:32, Ashwin Agrawal wrote:
> > On Wed, Aug 14, 2019 at 2:51 AM Ashutosh Sharma wrote:
> >> 2) Is there a chance that IndexOnlyScan would ever be required for
> >> zedstore tables considering the design approach taken for it?
> >
> > We have not given much thought to IndexOnlyScans so far. But I think
> > IndexOnlyScan definitely would be beneficial for zedstore as
> > well. Even for normal index scans as well, fetching as many columns
> > possible from Index itself and only getting rest of required columns
> > from the table would be good for zedstore. It would help to further
> > cut down IO. Ideally, for visibility checking only TidTree needs to be
> > scanned and visibility checked with the same, so the cost of checking
> > is much lower compared to heap (if VM can't be consulted) but still is
> > a cost. Also, with vacuum, if UNDO log gets trimmed, the visibility
> > checks are pretty cheap. Still given all that, having VM type thing to
> > optimize the same further would help.
>
> Hmm, yeah. An index-only scan on a zedstore table could perform the "VM
> checks" by checking the TID tree in the zedstore. It's not as compact as
> the 2 bits per TID in the heapam's visibility map, but it's pretty good.
>
> >> Further, I tried creating a zedstore table with btree index on one of
> >> it's column and loaded around 50 lacs record into the table. When the
> >> indexed column was scanned (with enable_seqscan flag set to off), it
> >> went for IndexOnlyScan and that took around 15-20 times more than it
> >> would take for IndexOnly Scan on heap table just because IndexOnlyScan
> >> in zedstore always goes to heap as the visibility check fails.
>
> Currently, an index-only scan on zedstore should be pretty much the same
> speed as a regular index scan. All the visibility checks will fail, and
> you end up fetching every row from the table, just like a regular index
> scan. So I think what you're seeing is that the index fetches on a
> zedstore table is much slower than on heap.
>
> Ideally, on a column store the index fetches would only fetch the needed
> columns, but I don't think that's been implemented yet, so all the
> columns are fetched. That can make a big difference, if you have a wide
> table with lots of columns, but only actually need a few of them. Was
> your test case something like that?
>
> We haven't spent much effort on optimizing index fetches yet, so I hope
> there's many other little tweaks there as well, that we can do to make
> it faster.
>
> - Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Asif Rehman 2019-08-26 12:53:06 Re: pgbench - allow to create partitioned tables
Previous Message Daniel Migowski 2019-08-26 11:28:47 Proposal: Better generation of values in GENERATED columns.