Re: Extracting only the columns needed for a query

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, david(dot)g(dot)kimura(at)gmail(dot)com
Subject: Re: Extracting only the columns needed for a query
Date: 2020-01-07 23:40:52
Message-ID: CAAKRu_YxyYOCCO2e83UmHb51sky1hXgeRzQw-PoqT1iHj2ZKVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I just wanted to address a question we got about making scanCols
instead of using RelOptInfo->attr_needed.

David Rowley had asked:

> For planning, isn't this information already available via either
> targetlists or from the RelOptInfo->attr_needed array combined with
> min_attr and max_attr?

attr_needed does not have all of the attributes needed set. Attributes
are only added in add_vars_to_targetlist() and this is only called for
certain query classes.

Also, Jeff Davis had asked off-list why we didn't start using
RelOptInfo->attr_needed for our purpose (marking which columns will
need to be scanned for the use of table access methods) instead of
adding a new member to RangeTblEntry.

The primary reason is that RelOptInfos are used during planning and
not execution. We need access to this information somewhere in the
PlannedStmt.

Even if we used attr_needed, we would, at some point, need to transfer
the columns needed to be scanned to a data structure available during
execution.

However, the next question was why not use attr_needed during costing
(which is the other time the table access method can use scanCols).

David Kimura and I dug into how attr_needed is used currently in
Postgres to understand if its meaning and usage is consistent with
using it instead of scanCols during costing.

We could set attr_needed in the same way we are now setting scanCols
and then use it during costing, however, besides the fact that we
would then have to create a member like scanCols anyway during
execution, it seems like adding an additional meaning to attr_needed
during planning is confusing and dangerous.

attr_needed is documented as being used to determine the highest
joinrel in which attribute is needed (when it was introduced
835bb975d8d).
Since then it has been extended to be used for removing joins and
relations from queries b78f6264eba33 and to determine if whole row
vars are used in a baserel (which isn't supported as a participant in
a partition-wise join) 7cfdc77023ad507317.

It actually seems like attr_needed might be too general a name for
this field.
It isn't set for every attribute in a query -- only in specific cases
for attributes in certain parts of the query. If a developer checks
attr_needed for the columns in his/her query, many times those
columns will not be present.
It might actually be better to rename attr_needed to clarify its
usage.
scanCols, on the other hand, has a clear meaning and usage. For table
access methods, scanCols are the columns that need to be scanned.
There might even be cases where this ends up being a different set
than attr_needed for a base rel.

Melanie & David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-01-07 23:49:01 Re: Removing pg_pltemplate and creating "trustable" extensions
Previous Message Tom Lane 2020-01-07 23:12:43 Re: xact_start for walsender & logical decoding not updated