Re: Zedstore - compressed in-core columnar storage

From: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: DEV_OPS <devops(at)ww-it(dot)cn>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-08-14 17:32:22
Message-ID: CALfoeivAGExe37Q1yVJyYPm+7k1AUS2_QW_cFCNamDA5vU_Few@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 14, 2019 at 2:51 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
wrote:

> Hi Ashwin,
>
> I tried playing around with the zedstore code a bit today and there
> are couple questions that came into my mind.
>

Great! Thank You.

>
> 1) Can zedstore tables be vacuumed? If yes, does VACUUM on zedstore
> table set the VM bits associated with it.
>

Zedstore tables can be vacuumed. On vacuum, minimal work is performed
though compared to heap. Full table is not scanned. Only UNDO log is
truncated/discarded based on RecentGlobalXmin. Plus, only TidTree or
Meta column is scanned to find dead tuples and index entries cleaned
for them, based on the same.

Currently, for zedstore we have not used the VM at all. So, it doesn't
touch the same during any operation.

2) Is there a chance that IndexOnlyScan would ever be required for
> zedstore tables considering the design approach taken for it?
>

We have not given much thought to IndexOnlyScans so far. But I think
IndexOnlyScan definitely would be beneficial for zedstore as
well. Even for normal index scans as well, fetching as many columns
possible from Index itself and only getting rest of required columns
from the table would be good for zedstore. It would help to further
cut down IO. Ideally, for visibility checking only TidTree needs to be
scanned and visibility checked with the same, so the cost of checking
is much lower compared to heap (if VM can't be consulted) but still is
a cost. Also, with vacuum, if UNDO log gets trimmed, the visibility
checks are pretty cheap. Still given all that, having VM type thing to
optimize the same further would help.

> Further, I tried creating a zedstore table with btree index on one of
> it's column and loaded around 50 lacs record into the table. When the
> indexed column was scanned (with enable_seqscan flag set to off), it
> went for IndexOnlyScan and that took around 15-20 times more than it
> would take for IndexOnly Scan on heap table just because IndexOnlyScan
> in zedstore always goes to heap as the visibility check fails.
> However, the seqscan on zedstore table is quite faster than seqscan on
> heap table because the time taken for I/O is quite less in case for
> zedstore.
>

Thanks for reporting, we will look into it. Should be able to optimize
it. Given no VM exists, IndexOnlyScans currently for zedstore behave
more or less like IndexScans. Planner picks IndexOnlyScans for
zedstore, mostly due to off values for reltuples, relpages, and
relallvisible.

We have been focused on implementing and optimizing the AM pieces. So,
not much work has been done for planner estimates and tunning yet. The
first step for the same to get the needed columns in the planner
instead of the executor in [1] is proposed. Once, that bakes will use
the same to perform more planner estimates and all. Also, analyze
needs work to properly reflect reltuples and relpages to influence the
planner correctly.

1]
https://www.postgresql.org/message-id/CAAKRu_ZQ0Jy7LfZDCY0JdxChdpja9rf-S8Y5%2BU4vX7cYJd62dA%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-08-14 18:01:43 Re: Feature: Use DNS SRV records for connecting
Previous Message Andres Freund 2019-08-14 17:05:53 Re: POC: Cleaning up orphaned files using undo logs