Re: Extracting only the columns needed for a query

From: Pengzhou Tang <ptang(at)pivotal(dot)io>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Extracting only the columns needed for a query
Date: 2020-02-14 09:59:39
Message-ID: CAG4reAQc9vYdmQXh=1D789x8XJ=gEkV+E+fT9+s9tOWDXX3L9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > > On Sat, Jun 15, 2019 at 10:02 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >
> > > Another reason for having the planner do this is that presumably, in
> > > an AM that's excited about this, the set of fetched columns should
> > > play into the cost estimates for the scan. I've not been paying
> > > enough attention to the tableam work to know if we've got hooks for
> > > the AM to affect scan costing ... but if we don't, that seems like
> > > a hole that needs plugged.
> >
> > AM callback relation_estimate_size exists currently which planner
> leverages.
> > Via this callback it fetches tuples, pages, etc.. So, our thought is to
> extend
> > this API if possible to pass down needed column and help perform better
> costing
> > for the query. Though we think if wish to leverage this function, need
> to know
> > list of columns before planning hence might need to use query tree.
>
> I believe it would be beneficial to add this potential API extension patch
> into
> the thread (as an example of an interface defining how scanCols could be
> used)
> and review them together.
>
> Thanks for your suggestion, we paste one potential API extension change
bellow for zedstore to use scanCols.

The change contains 3 patches to clarify our idea.
0001-ANALYZE.patch is a generic patch for ANALYZE API extension, we develop
it to make the
analysis of zedstore tables more accurate. It is more flexible now, eg,
TableAm can provide
logical block number as random sample seed; TableAm can only analyze
specified columns; TableAm
can provide extra info besides the data tuple.

0002-Planner.patch is the real patch to show how we use rte->scanCols for a
cost estimate, the main idea
is adding a new metric 'stadiskfrac' to catalog pg_statistic, 'stadiskfrac'
is the physical size ratio of a column,
it is calculated when ANALYZE is performed, 0001-ANALYZE.patch can help to
provide extra disk size info.
So when set_plain_rel_size() is called by the planner, it uses
rte->scanCols and 'stadiskfrac' to adjust the
rel->pages, please see set_plain_rel_page_estimates().

0003-ZedStore.patch is an example of how zedstore uses extended ANALYZE
API, I paste it here anywhere, in case someone
is interest in it.

Thanks,
Pengzhou

Attachment Content-Type Size
0001-ANALYZE-tableam-API-change.patch application/x-patch 42.5 KB
0002-Planner-can-estimate-the-pages-based-on-the-columns-.patch application/x-patch 8.2 KB
0003-ZedStore-use-extended-ANAlYZE-API.patch application/x-patch 8.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-02-14 10:05:48 Re: Parallel copy
Previous Message John Naylor 2020-02-14 09:50:47 Re: assert pg_class.relnatts is consistent