Re: Use of additional index columns in rows filtering

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, markus(dot)winand(at)winand(dot)at
Subject: Re: Use of additional index columns in rows filtering
Date: 2023-08-03 18:50:41
Message-ID: CAH2-Wz=snJ+vEN6aUcmUGi7M2h0yCgwK_GXhdHFf1qEdOwRWag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 3, 2023 at 4:57 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > I understand that that's how the patch is structured. It is
> > nevertheless possible (as things stand) that the patch will make the
> > planner shift from a plan that uses "Access Predicates" to the maximum
> > extent possible when scanning a composite index, to a similar plan
> > that has a similar index scan, for the same index, but with fewer
> > "Access Predicates" in total. In effect, the patched planner will swap
> > one type of predicate for another type because doing so enables the
> > executor to scan an index once instead of scanning it several times.
> >
>
> That seems very much like something the costing is meant to handle, no?
> I mean, surely "access predicate" and "filter" should affect the cost
> differently, with "filter" being more expensive (and table filter being
> more expensive than index filter).

I'm not 100% sure that it's not a costing issue, but intuitively it
doesn't seem like one.

As Goetz Graefe once said, "choice is confusion". It seems desirable
to have fewer, better index paths. This is possible whenever there is
a way to avoid the index paths that couldn't possibly be better in the
first place. Though we must also make sure that there is no real
downside -- possibly by teaching the executor to behave adaptively
instead of needlessly trusting what the planner says. Turning a plan
time decision into a runtime decision seems strictly better.

Obviously the planner will always need to be trusted to a significant
degree (especially for basic things like join order), but why not
avoid it when we can avoid it without any real downsides? Having lots
of slightly different index paths with slightly different types of
logically equivalent predicates seems highly undesirable, and quite
avoidable.

ISTM that it should be possible to avoid generating some of these
index paths based on static rules that assume that:

1. An "access predicate" is always strictly better than an equivalent
"index filter predicate" (for any definition of "index filter
predicate" you can think of).
2. An "Index Filter: " is always strictly better than an equivalent
"Filter: " (i.e. table filter).

The first item is what I've been going on about, of course. The second
item is the important principle behind your patch -- and one that I
also agree with. I don't see any contradictions here -- these two
principles are compatible. I think that we can have it both ways.

> AFAICS the assumption is that path #1 should be better, as it has two
> proper access predicates. But maybe if we add another condition C, it
> might end up like this:
>
> PATH #1: access predicates (A,B), table filter C
> PATH #2: access predicate A, index filter (B,C)
>
> and #2 will end up winning.

Why wouldn't we expect there to also be this path:

PATH #3: access predicates (A,B), index filter C

And why wouldn't we also expect this other path to always be better?
So much better that we don't even need to bother generating PATH #1
and PATH #2 in the first place, even?

Right now there are weird reasons why it might not be so -- strange
interactions with things like BitmapOr nodes that could make either
PATH #1 or PATH #2 look slightly cheaper. But that doesn't seem
particularly fundamental to me. We should probably just avoid those
plan shapes that have the potential to make PATH #1 and PATH #2
slightly cheaper, due only to perverse interactions.

> I like the discussion, but it feels a bit abstract (and distant from
> what the patch aimed to do) and I have trouble turning it into something
> actionable.

I think that I have gotten a lot out of this discussion -- it has made
my thinking about this stuff more rigorous. I really appreciate that.

> Does this apply to the index scan vs. index-only scans too? That is, do
> you think we should we have just one index-scan path, doing index-only
> stuff when possible?

I think so, yes. But index-only scans don't appear under BitmapOr
nodes, so right now I can't think of an obvious way of demonstrating
that this is true. Maybe it accidentally doesn't come up with
index-only scans in practice, but the same underlying principles
should be just as true.

> If we can form some sort of plan what needs to be done (both for my
> patch and for the SAOP patch), I'm willing to work on it ... But it's
> not quite clear to me what the requirements are.

I do hope to have more concrete proposals soon. Thanks for being patient.

For what it's worth, I actually think that there is a good chance that
I'll end up relying on what you've done here to make certain things I
want to do with the SOAP patch okay. It would be rather convenient to
be able to handle some of the SAOP safety issues without needing any
table filters (just index filters), in some corner cases. I think that
what you're doing here makes a lot of sense. FWIW, I am already
personally invested in the success of your patch.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-08-03 18:53:56 Re: Using defines for protocol characters
Previous Message Tomas Vondra 2023-08-03 18:17:05 Re: Use of additional index columns in rows filtering