Re: Extracting only the columns needed for a query

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Extracting only the columns needed for a query
Date: 2019-07-17 01:49:10
Message-ID: CAAKRu_ZQ0Jy7LfZDCY0JdxChdpja9rf-S8Y5+U4vX7cYJd62dA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We implemented Approach B in the attached patch set (patch 0001) and
then implemented Approach A (patch 0002) to sanity check the pruned
list of columns to scan we were getting at plan-time.
We emit a notice in SeqNext() if the two column sets differ.
Currently, for all of the queries in the regression suite, no
differences are found.
We did notice that none of the current regression tests for triggers
are showing a difference between columns that can be extracted at plan
time and those that can be extracted at execution time--though we
haven't dug into this at all.

In our implementation of Approach B, we extract the columns to scan in
make_one_rel() after set_base_rel_sizes() and before
set_base_rel_pathlists() so that the scan cols can be used in costing.
We do it after calling set_base_rel_sizes() because it eventually
calls set_append_rel_size() which adds PathTarget exprs for the
partitions with Vars having the correct varattno and varno.

We added the scanCols to RangeTblEntries because it seemed like the
easiest way to make sure the information was available at scan time
(as suggested by David).

A quirk in the patch for Approach B is that, in inheritance_planner(),
we copy over the scanCols which we populated in each subroot's rtable
to the final rtable.
The rtables for all of the subroots seem to be basically unused after
finishing with inheritance_planner(). That is, many other members of
the modified child PlannerInfos are copied over to the root
PlannerInfo, but the rtables seem to be an exception.
If we want to get at them later, we would have had to go rooting
around in the pathlist of the RelOptInfo, which seemed very complex.

One note: we did not add any logic to make the extraction of scan
columns conditional on the AM. We have code to do it conditionally in
the zedstore patch, but we made it generic here.

While we were implementing this, we found that in the columns
extracted at plan-time, there were excess columns for
UPDATE/DELETE...RETURNING on partition tables.
Vars for these columns are being added to the targetlist in
preprocess_targetlist(). Eventually targetlist items are added to
RelOptInfo->reltarget exprs.
However, when result_relation is 0, this means all of the vars from
the returningList will be added to the targetlist, which seems
incorrect. We included a patch (0003) to only add the vars if
result_relation is not 0.

Adding the scanCols in RangeTblEntry, we noticed that a few other
members of RangeTblEntry are consistently initialized to NULL whenever
a new RangeTblEntry is being made.
This is confusing because makeNode() for a RangeTblEntry does
palloc0() anyway, so, why is this necessary?
If it is necessary, why not create some kind of generic initialization
function to do this?

Thanks,
Ashwin & Melanie

Attachment Content-Type Size
v1-0002-Execution-time-scan-col-extraction-and-comparison.patch text/x-patch 4.6 KB
v1-0001-Plan-time-extraction-of-scan-cols.patch text/x-patch 9.7 KB
v1-0003-Avoid-adding-returningList-for-invalid-result_rel.patch text/x-patch 1.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-07-17 01:51:44 pg_stat_replication lag fields return non-NULL values even with NULL LSNs
Previous Message Michael Paquier 2019-07-17 01:38:35 Re: pg_receivewal documentation