Re: Use of additional index columns in rows filtering

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Use of additional index columns in rows filtering
Date: 2023-07-21 11:52:55
Message-ID: 3295927a-46c1-cb68-e154-780c50f0fccf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/21/23 05:32, Peter Geoghegan wrote:
> On Thu, Jul 20, 2023 at 4:35 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> I think the SAOP patch may need to be much more careful about this, but
>> for this patch it's simpler because it doesn't really change any of the
>> index internals, or the indexscan in general.
>
> It's true that the SAOP patch needs relatively complicated
> infrastructure to assess whether or not the technique is safe with a
> given set of quals. You cannot safely get an ordered index scan for
> something like "select * from table where a < 5 and b in (1,2,3) order
> by a, b". With or without my patch. My patch isn't really changing all
> that much about the behavior in nbtree, as these things go. It's
> *surprising* how little has to change about the high level structure
> of index scans, in fact.
>
> (Actually, I'm glossing over a lot. The MDAM paper describes
> techniques that'd make even the really challenging cases safe, through
> a process of converting quals from conjunctive normal form into
> disjunctive normal form. This is more or less the form that the state
> machine implemented by _bt_advance_array_keys() produces already,
> today. But even with all this practically all of the heavy lifting
> takes place before the index scan even begins, during preprocessing --
> so you still require surprisingly few changes to index scans
> themselves.)
>

Ah, OK. I was assuming the execution might be more complex. But I was
thinking more about the costing part - if you convert the clauses in
some way, does that affect the reliability of estimates somehow? If the
conversion from AND to OR makes the list of clauses more complex, that
might be an issue ...

The index-only filter does no such conversion, it just uses the same
clauses as before.

>> If I simplify that a bit, we're just reordering the clauses in a way to
>> maybe eliminate the heap fetch. The main risk seems to be that this will
>> force an expensive qual to the front of the list, just because it can be
>> evaluated on the index tuple.
>
> My example query might have been poorly chosen, because it involved a
> limit. What I'm thinking of is more general than that.
>

I wasn't really thinking about LIMIT, and I don't think it changes the
overall behavior very much (sure, it's damn difficult to estimate for
skewed data sets, but meh).

The case I had in mind is something like this:

CREATE TABLE t (a int, b int, c int);
CREATE INDEX ON t (a);
INSERT INTO t SELECT i, i, i FROM generate_series(1,1000000) s(i);

SELECT * FROM t WHERE bad_qual(a) AND b < 1 AND c < 1 ORDER BY a;

where bad_qual is expensive and matches almost all rows. Without the
index-only filters, we'd evaluate the conditions in this order

[b < 1], [c < 1], [bad_qual(a)]

but with naive index-only filters we do this:

[bad_qual(a)], [b < 1], [c < 1]

which is bad as it runs the expensive thing on every row.

FWIW the "ORDER BY" is necessary, because otherwise we may not even
build the index path (no index keys, no interesting pathkeys). It's just
an opportunistic optimization - if already doing index scan, try doing
this too. I wonder if we should relax that ...

>> But the difference would need to be worse
>> than what we save by not doing the I/O - considering how expensive the
>> I/O is, that seems unlikely. Could happen for expensive quals that don't
>> really eliminate many rows, I guess.
>
> That sounds like the same principle that I was talking about. I think
> that it can be pushed quite far, though. I am mostly talking about the
> worst case, and it seems like you might not be.
>

Yeah, I was proposing the usual costing approach, based on "average"
behavior. It has the usual weakness that if the estimates are far off,
we can have issues. There have been various discussions about maybe
considering how reliable the estimates are, to defend against that.
Which is tough to do.

In a way, focusing on the worst case does that by assuming the worst
combination - which is fine, although it may choose the slower (but
safer) approach in some cases.

> You can easily construct examples where some kind of skew causes big
> problems with a multi-column index. I'm thinking of indexes whose
> leading columns are low cardinality, and queries where including the
> second column as an index qual looks kind of marginal to the
> optimizer. Each grouping represented in the most significant index
> column might easily have its own unique characteristics; the
> distribution of values in subsequent columns might be quite
> inconsistent across each grouping, in whatever way.
>
> Since nothing stops a qual on a lower order column having a wildly
> different selectivity for one particular grouping, it might not make
> sense to say that a problem in this area is due to a bad selectivity
> estimate. Even if we have perfect estimates, what good can they do if
> the optimal strategy is to vary our strategy at runtime, *within* an
> individual index scan, as different parts of the key space (different
> groupings) are traversed through? To skip or not to skip, say. This
> isn't about picking the cheapest plan, really.
>

Well, yeah. It's one thing to assign some cost estimate to the plan, and
another thing what happens at runtime. It would be nice to be able to
reflect the expected runtime behavior in the cost (otherwise you get
confused users complaining that we pick the "wrong" plan).

> That's another huge advantage of index quals -- they can (at least in
> principle) allow you skip over big parts of the index when it just
> ends up making sense, in whatever way, for whatever reason. In the
> index, and in the heap. Often both. You'd likely always prefer to err
> in the direction of having more index quals rather than fewer, when
> doing so doesn't substantially change the plan itself. It could be
> very cheap insurance, even without any limit. (It would probably also
> be a lot faster, but it needn't be.)
>

True, although my patch doesn't change the number of index quals. It's
just about the other quals.

>> Anyway, I see this as extension of what order_qual_clauses() does. The
>> main issue is that even order_qual_clauses() admits the estimates are
>> somewhat unreliable, so we can't expect to make perfect decisions.
>
> The attribute value independence assumption is wishful thinking, in no
> small part -- it's quite surprising that it works as well as it does,
> really.
>

Yeah. For OLTP it works pretty well, for OLAP not so much.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karina Litskevich 2023-07-21 12:13:02 Re: Avoid unused value (src/fe_utils/print.c)
Previous Message shveta malik 2023-07-21 11:46:24 Re: Synchronizing slots from primary to standby