Re: Use of additional index columns in rows filtering

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, markus(dot)winand(at)winand(dot)at
Subject: Re: Use of additional index columns in rows filtering
Date: 2023-08-03 01:32:40
Message-ID: CAH2-WzmCduTR7bHYkNEu-RafvObYRqyVoBbzfesVROmr6W1_Uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 2, 2023 at 6:48 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> How come we don't know that until the execution time? Surely when
> building the paths/plans, we match the clauses to the index keys, no? Or
> are you saying that just having a scan key is not enough for it to be
> "access predicate"?

In principle we can and probably should recognize the difference
between "Access Predicates" and "Index Filter Predicates" much earlier
on, during planning. Probably when index paths are first built.

It seems quite likely that there will turn out to be at least 2 or 3
reasons to do this. The EXPLAIN usability issue might be enough of a
reason on its own, though.

> Anyway, this patch is mostly about "Index Cond" mixing two types of
> predicates. But the patch is really about "Filter" predicates - moving
> some of them from table to index. So quite similar to the "index filter
> predicates" except that those are handled by the index AM.

I understand that that's how the patch is structured. It is
nevertheless possible (as things stand) that the patch will make the
planner shift from a plan that uses "Access Predicates" to the maximum
extent possible when scanning a composite index, to a similar plan
that has a similar index scan, for the same index, but with fewer
"Access Predicates" in total. In effect, the patched planner will swap
one type of predicate for another type because doing so enables the
executor to scan an index once instead of scanning it several times.

I don't dispute the fact that this can only happen when the planner
believes (with good reason) that the expected cost will be lower. But
I maintain that there is a novel risk to be concerned about, which is
meaningfully distinct from the general risk of regressions that comes
from making just about any change to the planner. The important
principle here is that we should have a strong bias in the direction
of making quals into true "Access Predicates" whenever practical.

Yeah, technically the patch doesn't directly disrupt how existing
index paths get generated. But it does have the potential to disrupt
it indirectly, by providing an alternative very-similar index path
that's likely to outcompete the existing one in these cases. I think
that we should have just one index path that does everything well
instead.

> But differentiating between access and filter predicates (at the index
> AM level) seems rather independent of what this patch aims to do.

My concern is directly related to the question of "access predicates
versus filter predicates", and the planner's current ignorance on
which is which. The difference may not matter too much right now, but
ISTM that your patch makes it matter a lot more. And so in that
indirect sense it does seem relevant.

The planner has always had a strong bias in the direction of making
clauses that can be index quals into index quals, rather than filter
predicates. It makes sense to do that even when they aren't very
selective. This is a similar though distinct principle.

It's awkward to discuss the issue, since we don't really have official
names for these things just yet (I'm just going with what Markus calls
them for now). And because there is more than one type of "index
filter predicate" in play with the patch (namely those in the index
AM, and those in the index scan executor node). But my concern boils
down to access predicates being strictly better than equivalent index
filter predicates.

> I'm not following. Why couldn't there be some second-order effects?
> Maybe it's obvious / implied by something said earlier, but I think
> every time we decide between multiple choices, there's a risk of picking
> wrong.

Naturally, I agree that some amount of risk is inherent. I believe
that the risk I have identified is qualitatively different to the
standard, inherent kind of risk.

> Anyway, is this still about this patch or rather about your SAOP patch?

It's mostly not about my SAOP patch. It's just that my SAOP patch will
do exactly the right thing in this particular case (at least once
combined with Alena Rybakina's OR-to-SAOP transformation patch). It is
therefore quite clear that a better plan is possible in principle.
Clearly we *can* have the benefit of only one single index scan (i.e.
no BitmapOr node combining multiple similar index scans), without
accepting any novel new risk to get that benefit. We should just have
one index path that does it all, while avoiding duplicative index
paths that embody a distinction that really shouldn't exist -- it
should be dealt with at runtime instead.

> True. IMHO the danger or "index filter" predicates is that people assume
> index conditions eliminate large parts of the index - which is not
> necessarily the case here. If we can improve this, cool.

I think that we'd find a way to use the same information in the
planner, too. It's not just users that should care about the
difference. And I don't think that it's particularly likely to be
limited to SAOP/MDAM stuff. As Markus said, we're talking about an
important general principle here.

> But again, this is not what this patch does, right? It's about moving
> stuff from "table filter" to "index filter". And those clauses were not
> matched to the index AM at all, so it's not really relevant to the
> discussion about different subtypes of predicates.

I understand that what I've said is not particularly helpful. At least
not on its own. You quite naturally won't want to tie the fate of this
patch to my SAOP patch, which is significantly more complicated. I do
think that my concern about this being a novel risk needs to be
carefully considered.

Maybe it's possible to address my concern outside of the context of my
own SAOP patch. That would definitely be preferable all around. But
"access predicates versus filter predicates" seems important and
relevant, either way.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-08-03 02:24:48 Fix bogus Asserts in calc_non_nestloop_required_outer
Previous Message Andy Fan 2023-08-03 00:50:47 Re: Extract numeric filed in JSONB more effectively