Re: Extracting only the columns needed for a query

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extracting only the columns needed for a query
Date: 2019-06-15 17:01:41
Message-ID: 23194.1560618101@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Melanie Plageman <melanieplageman(at)gmail(dot)com> writes:
> While hacking on zedstore, we needed to get a list of the columns to
> be projected--basically all of the columns needed to satisfy the
> query. The two use cases we have for this is
> 1) to pass this column list down to the AM layer for the AM to leverage it
> 2) for use during planning to improving costing
> In other threads, such as [1], there has been discussion about the
> possible benefits for all table types of having access to this set of
> columns.

The thing that most approaches to this have fallen down on is triggers ---
that is, a trigger function might access columns mentioned nowhere in the
SQL text. (See 8b6da83d1 for a recent example :-() If you have a plan
for dealing with that, then ...

> Approach B: after parsing and/or after planning

If we wanted to do something about this, making the planner record
the set of used columns seems like the thing to do. We could avoid
the expense of doing it when it's not needed by setting up an AM/FDW/
etc property or callback to request it.

Another reason for having the planner do this is that presumably, in
an AM that's excited about this, the set of fetched columns should
play into the cost estimates for the scan. I've not been paying
enough attention to the tableam work to know if we've got hooks for
the AM to affect scan costing ... but if we don't, that seems like
a hole that needs plugged.

> I'm not sure, however, if just getting the
> PathTargets from the RelOptInfos is sufficient for obtaining the
> whole set of columns used in the query.

The PathTarget only records the columns that need to be *emitted*
by the relation scan plan node. You'd also have to look at the
baserestrictinfo conditions to see what columns are inspected by
filter conditions. But that seems fine to me anyway since the AM
might conceivably have different requirements for filter variables than
emitted variables. (As a concrete example, filter conditions that are
handled by non-lossy index checks might not give rise to any requirement
for the table AM to do anything.) So that line of thought confirms that
we want to do this at the end of planning when we know the shape of the
plan tree; the parser can't do it usefully.

> Approach B, however, does not work for utility statements which do
> not go through planning.

I'm not sure why you're excited about that case? Utility statements
tend to be pretty much all-or-nothing as far as data access goes.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2019-06-15 18:07:32 Re: Dead encoding conversion functions
Previous Message Alvaro Herrera 2019-06-15 17:01:33 Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock