Index INCLUDE vs. Bitmap Index Scan

From: Markus Winand <markus(dot)winand(at)winand(dot)at>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Index INCLUDE vs. Bitmap Index Scan
Date: 2019-02-26 20:07:01
Message-ID: 7DF52167-4379-4A1E-A957-90D774EBDF21@winand.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

I think Bitmap Index Scan should take advantage of B-tree INCLUDE columns, which it doesn’t at the moment (tested on master as of yesterday).

Consider this (find the setup at the bottom of this mail).

CREATE INDEX idx ON tbl (a, b) INCLUDE (c);

EXPLAIN (analyze, buffers)
SELECT *
FROM tbl
WHERE a = 1
AND c = 1;

The following plan could be produced:

QUERY PLAN
------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=4.14..8.16 rows=1 width=7616) (actual time=0.027..0.029 rows=1 loops=1)
Recheck Cond: (a = 1)
Filter: (c = 1)
Rows Removed by Filter: 7
Heap Blocks: exact=2
Buffers: shared hit=2 read=1
-> Bitmap Index Scan on idx (cost=0.00..4.14 rows=1 width=0) (actual time=0.018..0.018 rows=8 loops=1)
Index Cond: (a = 1)
Buffers: shared read=1
Planning Time: 0.615 ms
Execution Time: 0.060 ms

The Bitmap Index Scan does not filter on column C, which is INCLUDEd in the index leaf nodes.

Instead, the Bitmap Heap Scan fetches unnecessary blocks and then applies the filter.

I would expect a similar execution as with this index:

DROP INDEX idx;
CREATE INDEX idx ON tbl (a, b, c);

QUERY PLAN
------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=4.14..8.16 rows=1 width=7616) (actual time=0.021..0.021 rows=1 loops=1)
Recheck Cond: ((a = 1) AND (c = 1))
Heap Blocks: exact=1
Buffers: shared hit=1 read=1
-> Bitmap Index Scan on idx (cost=0.00..4.14 rows=1 width=0) (actual time=0.018..0.018 rows=1 loops=1)
Index Cond: ((a = 1) AND (c = 1))
Buffers: shared read=1
Planning Time: 0.123 ms
Execution Time: 0.037 ms

(As a side node: I also dislike it how Bitmap Index Scan mixes search conditions and filters in “Index Cond”)

Due to the not-filtered column B in the index, the use of column C is here pretty much the same as it is for the first index, which has C in INCLUDE.

Am I missing anything or is it just an oversight because INCLUDE was primarily done for Index Only Scan?

As a background, here is the use case I see for this scenario:
Filters that can not be used for searching the tree can be put in INCLUDE instead of the key-part of the index even if there is no intend to run the query as Index Only Scan (e.g. SELECT *). Columns in INCLUDE can be used for filtering (works fine for Index Only Scan, btw).

The most prominent example are post-fix or in-fix LIKE ‘%term%’ filters. The benefit of putting such columns in INCLUDE is that it is clearly documented that the index column is neither used for searching in the tree nor for ordering. It is either used for filtering, or for fetching. Knowing this makes it easier to extend existing index.

Imagine you have this query to optimize:

SELECT *
FROM …
WHERE a = ?
AND b = ?
ORDER BY ts DESC
LIMIT 10

And find this existing index on that table:

CREATE INDEX … ON … (a, b) INCLUDE (c);

In this case it is a rather safe option to replace the index with the following:

CREATE INDEX … ON … (a, b, ts) INCLUDE (c);

On the other hand, if the original index had all column in the key part, changing it to (A, B, TS, C) is a rather dangerous option.

-markus

— Setup for the demo

-- demo table: 4 rows per block
-- the row if interest is the first one, the others spill to a second block so the problem can also seen in “Buffers"

CREATE TABLE tbl (a INTEGER, b INTEGER, c INTEGER, junk CHAR(1900));

INSERT INTO tbl VALUES (1, 1, 1, 'junk'),
(1, 1, 9, 'junk'),
(1, 1, 9, 'junk'),
(1, 1, 9, 'junk'),
(1, 1, 9, 'junk'),
(1, 1, 9, 'junk'),
(1, 1, 9, 'junk'),
(1, 1, 9, 'junk');

-- prevent unwanted plans for demo of Bitmap Index+Heap Scan
SET ENABLE_INDEXSCAN = OFF;
SET ENABLE_SEQSCAN = OFF;

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2019-02-26 20:12:35 Re: No-rewrite timestamp<->timestamptz conversions
Previous Message Andres Freund 2019-02-26 19:46:30 Re: Remove Deprecated Exclusive Backup Mode