Re: Use of additional index columns in rows filtering

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, markus(dot)winand(at)winand(dot)at
Subject: Re: Use of additional index columns in rows filtering
Date: 2023-08-03 11:57:10
Message-ID: a899742e-7c75-b8ea-c3be-05b6ee1c76c3@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/3/23 03:32, Peter Geoghegan wrote:
> On Wed, Aug 2, 2023 at 6:48 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> How come we don't know that until the execution time? Surely when
>> building the paths/plans, we match the clauses to the index keys, no? Or
>> are you saying that just having a scan key is not enough for it to be
>> "access predicate"?
>
> In principle we can and probably should recognize the difference
> between "Access Predicates" and "Index Filter Predicates" much earlier
> on, during planning. Probably when index paths are first built.
>
> It seems quite likely that there will turn out to be at least 2 or 3
> reasons to do this. The EXPLAIN usability issue might be enough of a
> reason on its own, though.
>

OK

>> Anyway, this patch is mostly about "Index Cond" mixing two types of
>> predicates. But the patch is really about "Filter" predicates - moving
>> some of them from table to index. So quite similar to the "index filter
>> predicates" except that those are handled by the index AM.
>
> I understand that that's how the patch is structured. It is
> nevertheless possible (as things stand) that the patch will make the
> planner shift from a plan that uses "Access Predicates" to the maximum
> extent possible when scanning a composite index, to a similar plan
> that has a similar index scan, for the same index, but with fewer
> "Access Predicates" in total. In effect, the patched planner will swap
> one type of predicate for another type because doing so enables the
> executor to scan an index once instead of scanning it several times.
>

That seems very much like something the costing is meant to handle, no?
I mean, surely "access predicate" and "filter" should affect the cost
differently, with "filter" being more expensive (and table filter being
more expensive than index filter).

I was not sure why would the patch affect the plan choice, as it does
not really affect the index predicates (passed to the AM) at all. But I
think I get the point now - the thing is in having two different index
paths, like:

PATH #1: access predicates (A,B)
PATH #2: access predicate A, index filter B

AFAICS the assumption is that path #1 should be better, as it has two
proper access predicates. But maybe if we add another condition C, it
might end up like this:

PATH #1: access predicates (A,B), table filter C
PATH #2: access predicate A, index filter (B,C)

and #2 will end up winning.

I still think this seems more like a costing issue (and I'd guess we may
already have similar cases for index-only scans).

IMO we can either consider the different predicate types during costing.
Sure, then we have the usual costing risks, but that's expected.

Or we could just ignore this during costing entirely (and ditch the
costing from the patch). Then the cost doesn't change, and we don't have
any new risks.

> I don't dispute the fact that this can only happen when the planner
> believes (with good reason) that the expected cost will be lower. But
> I maintain that there is a novel risk to be concerned about, which is
> meaningfully distinct from the general risk of regressions that comes
> from making just about any change to the planner. The important
> principle here is that we should have a strong bias in the direction
> of making quals into true "Access Predicates" whenever practical.
>
> Yeah, technically the patch doesn't directly disrupt how existing
> index paths get generated. But it does have the potential to disrupt
> it indirectly, by providing an alternative very-similar index path
> that's likely to outcompete the existing one in these cases. I think
> that we should have just one index path that does everything well
> instead.
>

Yeah, I think that's the scenario I described above ...

>> But differentiating between access and filter predicates (at the index
>> AM level) seems rather independent of what this patch aims to do.
>
> My concern is directly related to the question of "access predicates
> versus filter predicates", and the planner's current ignorance on
> which is which. The difference may not matter too much right now, but
> ISTM that your patch makes it matter a lot more. And so in that
> indirect sense it does seem relevant.
>

I'm not sure my patch makes it matter a lot more. Yes, moving a filter
from the table to the index may lower the scan cost, but that can happen
for a lot of other reasons ...

> The planner has always had a strong bias in the direction of making
> clauses that can be index quals into index quals, rather than filter
> predicates. It makes sense to do that even when they aren't very
> selective. This is a similar though distinct principle.
>
> It's awkward to discuss the issue, since we don't really have official
> names for these things just yet (I'm just going with what Markus calls
> them for now). And because there is more than one type of "index
> filter predicate" in play with the patch (namely those in the index
> AM, and those in the index scan executor node). But my concern boils
> down to access predicates being strictly better than equivalent index
> filter predicates.
>

I like the discussion, but it feels a bit abstract (and distant from
what the patch aimed to do) and I have trouble turning it into something
actionable.

>> I'm not following. Why couldn't there be some second-order effects?
>> Maybe it's obvious / implied by something said earlier, but I think
>> every time we decide between multiple choices, there's a risk of picking
>> wrong.
>
> Naturally, I agree that some amount of risk is inherent. I believe
> that the risk I have identified is qualitatively different to the
> standard, inherent kind of risk.
>
>> Anyway, is this still about this patch or rather about your SAOP patch?
>
> It's mostly not about my SAOP patch. It's just that my SAOP patch will
> do exactly the right thing in this particular case (at least once
> combined with Alena Rybakina's OR-to-SAOP transformation patch). It is
> therefore quite clear that a better plan is possible in principle.
> Clearly we *can* have the benefit of only one single index scan (i.e.
> no BitmapOr node combining multiple similar index scans), without
> accepting any novel new risk to get that benefit. We should just have
> one index path that does it all, while avoiding duplicative index
> paths that embody a distinction that really shouldn't exist -- it
> should be dealt with at runtime instead.
>

Does this apply to the index scan vs. index-only scans too? That is, do
you think we should we have just one index-scan path, doing index-only
stuff when possible?

>> True. IMHO the danger or "index filter" predicates is that people assume
>> index conditions eliminate large parts of the index - which is not
>> necessarily the case here. If we can improve this, cool.
>
> I think that we'd find a way to use the same information in the
> planner, too. It's not just users that should care about the
> difference. And I don't think that it's particularly likely to be
> limited to SAOP/MDAM stuff. As Markus said, we're talking about an
> important general principle here.
>

If we want/need to consider this during costing, this seems necessary.

>> But again, this is not what this patch does, right? It's about moving
>> stuff from "table filter" to "index filter". And those clauses were not
>> matched to the index AM at all, so it's not really relevant to the
>> discussion about different subtypes of predicates.
>
> I understand that what I've said is not particularly helpful. At least
> not on its own. You quite naturally won't want to tie the fate of this
> patch to my SAOP patch, which is significantly more complicated. I do
> think that my concern about this being a novel risk needs to be
> carefully considered.
>
> Maybe it's possible to address my concern outside of the context of my
> own SAOP patch. That would definitely be preferable all around. But
> "access predicates versus filter predicates" seems important and
> relevant, either way.
>

If we can form some sort of plan what needs to be done (both for my
patch and for the SAOP patch), I'm willing to work on it ... But it's
not quite clear to me what the requirements are.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laetitia Avrot 2023-08-03 12:25:07 Re: Adding a pg_servername() function
Previous Message Melih Mutlu 2023-08-03 11:29:59 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication