Re: Projection pushdown to index access method

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Chris Cleveland <ccleveland(at)dieselpoint(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Projection pushdown to index access method
Date: 2023-09-19 17:10:29
Message-ID: CA+TgmobizmMY1DXNqLriOOaVBBUrFruh_CAzkMh-6556mk873A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 19, 2023 at 12:35 PM Chris Cleveland
<ccleveland(at)dieselpoint(dot)com> wrote:
> I'm working on an index access method. I have a function which can appear in a projection list which should be evaluated by the access method itself. Example:
>
> SELECT title, my_special_function(body)
> FROM books
> WHERE book_id <===> 42;
>
> "<===>" is the operator that invokes the access method. The value returned by my_special_function() gets calculated during the index scan, and depends on information that exists only in the index.
>
> How do I get the system to pull the value from the index instead of trying to calculate it?

I don't see how you can do this in general, because there's no
guarantee that the plan will be an Index Scan or Index Only Scan
instead of a Seq Scan or Bitmap Heap/Index Scan.

> So far, I have created a CustomScan and set it using set_rel_pathlist_hook. The hook function gives us a PlannerInfo, RelOptInfo, Index, and RangeTblEntry. So far as I can tell, only RelOptInfo.reltarget.exprs gives us any info on the SELECT expressions, but unfortunately, the exprs are Var nodes that contain the (title, body) columns from above, and do not say anything about my_special_function().

So what does the EXPLAIN plan look like?

I'm not quite sure what's happening here, but the planner likes to
make plans that just fetch attributes from all the relations being
joined (here, there's just one) and then perform the calculation of
any expressions at the very end, as the final step, or at least as the
final step at that subquery level. And if it plans to ask your custom
scan for title, body, and book_id and then compute
my_special_function(body) after the fact, the thing you want to happen
is not going to happen. If the planner can be induced to ask your
custom scan for my_special_function(body), then I *think* you should
be able to arrange to get that value any way you like and just return
it. But I don't quite know how to induce the planner to do that -- and
especially if this query involved more than one table, because of the
planner's tendency to postpone expression evaluation until after joins
are done.

> I know this is possible, because the docs for PathTarget say this:
>
> PathTarget
> *
> * This struct contains what we need to know during planning about the
> * targetlist (output columns) that a Path will compute. Each RelOptInfo
> * includes a default PathTarget, which its individual Paths may simply
> * reference. However, in some cases a Path may compute outputs different
> * from other Paths, and in that case we make a custom PathTarget for it.
> * For example, an indexscan might return index expressions that would
> * otherwise need to be explicitly calculated.

It's just worth keeping in mind that the planner and the executor are
very tightly bound together here. This may be one of those cases
getting the executor to do what you want is the easy part, and getting
the planner to produce a plan that tells it to do that is the hard
part.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-09-19 17:33:22 Re: Disabling Heap-Only Tuples
Previous Message Andres Freund 2023-09-19 16:56:33 Re: Disabling Heap-Only Tuples