Re: [Proposal] Extend TableAM routines for ANALYZE scan

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Pengzhou Tang <ptang(at)pivotal(dot)io>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Extend TableAM routines for ANALYZE scan
Date: 2019-12-23 12:51:28
Message-ID: CAOBaU_Z4fRRvzwMMprAe8fHSXu2LMwMET2O4WNrwUZRKozf02A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

On Thu, Dec 5, 2019 at 11:14 AM Pengzhou Tang <ptang(at)pivotal(dot)io> wrote:
>
> When hacking the Zedstore, we need to get a more accurate statistic for zedstore and we
> faced some restrictions:
> 1) acquire_sample_rows() always use RelationGetNumberOfBlocks to generate sampling block
> numbers, this is not friendly for zedstore which wants to use a logical block number and might also
> not friendly to non-block-oriented Table AMs.
> 2) columns of zedstore table store separately, so columns in a row have a different physical position,
> tid in a tuple is invalid for zedstore which means the correlation statistic is incorrect for zedstore.
> 3) RelOptInfo->pages is not correct for Zedstore if we only access partial of the columns which make
> the IO cost much higher than the actual cost.
>
> For 1) and 2), we propose to extend existing ANALYZE-scan table AM routines in patch
> "0001-ANALYZE-tableam-API-change.patch" which add three more APIs:
> scan_analyze_beginscan(), scan_analyze_sample_tuple(), scan_analyze_endscan(). This provides
> more convenience and table AMs can take more control of every step of sampling rows. Meanwhile,
> with the new structure named "AcquireSampleContext", we can acquire extra info (eg: physical position,
> physical size) except the real columns values.
>
> For 3), we hope we can have a similar mechanism with RelOptInfo->rows which is calculated from
> (RelOptInfo->tuples * Selectivity), we can calculate RelOptInfo->pages with a page selectivity which
> is base on the selected zedstore columns. 0002-Planner-can-estimate-the-pages-based-on-the-columns-.patch
> shows one idea that adding the `stadiskfrac` to pg_statistic and planner use it to estimate the
> RelOptInfo->pages.
>
> 0003-ZedStore-use-extended-ANAlYZE-API.patch is attached to only show how Zedstore use the
> previous patches to achieve:
> 1. use logical block id to acquire the sample rows.
> 2. can only acquire sample rows from specified column c1, this is used when user only analyze table
> on specified columns eg: "analyze zs (c1)".
> 3 when ANALYZE, zedstore table AM provided extra disksize info, then ANALYZE compute the
> physical fraction statistic of each column and planner use it to estimate the IO cost based on
> the selected columns.

I couldn't find an entry for that patchset in the next commitfest.
Could you register it so that it won't be forgotten?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2019-12-23 14:38:16 Re: Fetching timeline during recovery
Previous Message Julien Rouhaud 2019-12-23 12:33:31 Re: [PATCH] Increase the maximum value track_activity_query_size