Re: Index-only scan for "f(x)" without "x"

From: Malthe <mborch(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Index-only scan for "f(x)" without "x"
Date: 2020-01-23 07:39:21
Message-ID: CAAPh5Fm9Nx=t3QKHHtSH8D0RbaxONjVAUT0AK_gdCT-149E0sQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 23 Jan 2020 at 00:00, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The problem is that the planner's initial analysis of the query tree
> concludes that the scan of "tab" has to return "x", because it looks
> through the tree for plain Vars, and "x" is what it's going to find.
> This conclusion is in fact true for any plan that doesn't involve
> scanning a suitable index on f(x), so it's not something we can just
> dispense with.

If f(x) was actually added to the table as a virtual generated column
V then the planner could perhaps reasonably find that a particular
expression depends in part on V (rather than the set of real columns
underlying the virtual generated column).

That is, the planner now knows that it needs to return V = "f(x)"
rather than "x" and so it can match the requirement to our index and
decide to use an index-only scan.

> To back up from that and conclude that the indexscan doesn't really
> need to return "x" because every use of it is in the context "f(x)"
> seems like a pretty expensive proposition, especially once you start
> considering index expressions that are more complex than a single
> function call. A brute-force matching search could easily be O(N^2)
> or worse in the size of the query tree, which is a lot to pay when
> there's a pretty high chance of failure. (IOW, we have to expect
> that most queries on "tab" are in fact not going to be able to use
> that index, so we can't afford to spend a lot of planning time just
> because the index exists.)

In the approach outlined above, the set of expressions to match on
would be limited by the set of virtual generated columns defined on
the table — some sort of prefix tree that allows efficient matching on
a given expression syntax tree? There would be no cost to this on a
table that had no virtual generated columns.

The question is whether this is a reasonable take on virtual generated columns.

--- regards

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-01-23 08:20:46 Re: progress report for ANALYZE
Previous Message Amit Langote 2020-01-23 07:31:12 Re: Run-time pruning for ModifyTable