Quick Links

Re: Use of additional index columns in rows filtering

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Use of additional index columns in rows filtering
Date:	2023-06-08 17:34:22
Message-ID:	c6b5da88-3ce1-14f2-f969-d819bd3acc5d@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I took a stab at this and implemented the trick with the VM - during
index scan, we also extract the filters that only need the indexed
attributes (just like in IOS). And then, during the execution we:

1) scan the index using the scan keys (as before)

2) if the heap page is all-visible, we check the new filters that can
be evaluated on the index tuple

3) fetch the heap tuple and evaluate the filters

This is pretty much exactly the same thing we do for IOS, so I don't see
why this would be incorrect while IOS is correct.

This also adds "Index Filter" to explain output, to show which filters
are executed on the index tuple (at the moment the filters are a subset
of "Filter"), so if the index tuple matches we'll execute them again on
the heap tuple. I guess that could be fixed by having two "filter"
lists, depending on whether we were able to evaluate the index filters.

Most of the patch is pretty mechanical - particularly the planning part
is about identifying filters that can be evaluated on the index tuple,
and that code was mostly shamelessly copied from index-only scan.

The matching of filters to index is done in check_index_filter(), and
it's simpler than match_clause_to_indexcol() as it does not need to
consider operators etc. (I think). But maybe it should be careful about
other things, not sure.

The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned
earlier, the idea is to check VM and evaluate the filters on the index
tuple if possible, similar to index-only scans. Except that we then have
to fetch the heap tuple. Unfortunately, this means the code can't use
index_getnext_slot() anymore. Perhaps we should invent a new variant
that'd allow evaluating the index filters in between.

With the patch applied, the query plan changes from:

QUERY PLAN
-------------------------------------------------------------------
Limit (cost=0.42..10929.89 rows=1 width=12)
(actual time=94.649..94.653 rows=0 loops=1)
Buffers: shared hit=197575 read=661
-> Index Scan using t_a_include_b on t
(cost=0.42..10929.89 rows=1 width=12)
(actual time=94.646..94.647 rows=0 loops=1)
Index Cond: (a > 1000000)
Filter: (b = 4)
Rows Removed by Filter: 197780
Buffers: shared hit=197575 read=661
Planning Time: 0.091 ms
Execution Time: 94.674 ms
(9 rows)

QUERY PLAN
-------------------------------------------------------------------
Limit (cost=0.42..3662.15 rows=1 width=12)
(actual time=13.663..13.667 rows=0 loops=1)
Buffers: shared hit=544
-> Index Scan using t_a_include_b on t
(cost=0.42..3662.15 rows=1 width=12)
(actual time=13.659..13.660 rows=0 loops=1)
Index Cond: (a > 1000000)
Index Filter: (b = 4)
Rows Removed by Index Recheck: 197780
Filter: (b = 4)
Buffers: shared hit=544
Planning Time: 0.105 ms
Execution Time: 13.690 ms
(10 rows)

which is much closer to the "best" case:

QUERY PLAN
-------------------------------------------------------------------
Limit (cost=0.42..4155.90 rows=1 width=12)
(actual time=10.152..10.156 rows=0 loops=1)
Buffers: shared read=543
-> Index Scan using t_a_b_idx on t
(cost=0.42..4155.90 rows=1 width=12)
(actual time=10.148..10.150 rows=0 loops=1)
Index Cond: ((a > 1000000) AND (b = 4))
Buffers: shared read=543
Planning Time: 0.089 ms
Execution Time: 10.176 ms
(7 rows)

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
0001-evaluate-filters-on-the-index-tuple-when-po-20230608.patch	text/x-patch	39.6 KB

In response to

Re: Use of additional index columns in rows filtering at 2023-02-15 17:01:50 from Tomas Vondra

Responses

Re: Use of additional index columns in rows filtering at 2023-06-21 12:45:19 from James Coleman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Russell Foster	2023-06-08 17:41:36	Postgres v15 windows bincheck regression test failures
Previous Message	Dave Cramer	2023-06-08 17:31:32	Re: Named Prepared statement problems and possible solutions