Re: Redundant bitmap index scans on smallint column

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redundant bitmap index scans on smallint column
Date: 2011-09-05 18:01:46
Message-ID: 10968.1315245706@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> This simple query shouldn't cause two bitmap index scans:
> EXPLAIN select * from test where b='0';

> Bitmap Heap Scan on test (cost=1056.68..8200.12 rows=29839 width=314)
> Recheck Cond: ((b = 0) AND (b = 0::smallint))
> -> BitmapAnd (cost=1056.68..1056.68 rows=5237 width=0)
> -> Bitmap Index Scan on test_i_idx (cost=0.00..485.45
> rows=29839 width=0)
> -> Bitmap Index Scan on test_b_c_idx (cost=0.00..556.06
> rows=29839 width=0)
> Index Cond: (b = 0::smallint)

> One of the indexes is a partial index, and the other is just a simple index.

> Apparently, for some reason, the '0' is expanded into both an integer
> and a smallint literal and the planner thinks it can reduce rows by
> checking the condition twice?

Yeah, this happens because choose_bitmap_and() compromises between
planning speed and exact detection of redundant index conditions.
What we have to start with is WHERE b = 0::smallint, which the planner
is able to prove implies the index predicate WHERE b = 0::integer,
so both indexes are considered. But the check for predicate redundancy
in choose_bitmap_and() only uses simple equality not provability,
so it does not recognize that the two indexes are entirely redundant.

I'm not really eager to change that, especially in view of the fact
that a plain (non bitmap) indexscan is considerably cheaper than any
of these alternatives in this example.

I did have an idea while looking at this example --- namely, that
we could provide some further protection cheaply with this simple hack
in cost_bitmap_heap_scan:

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7812a8628fc94335aaf1f506c4ea5ebb7960f8d8..c001725267a06063f45bbcde0b09f5784b0f5c3a 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*************** cost_bitmap_heap_scan(Path *path, Planne
*** 607,612 ****
--- 607,622 ----
*/
tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);

+ /*
+ * Disbelieve an estimate that's less than what we previously estimated
+ * for the actual number of rows needed from the table. This can happen
+ * when we are considering a bitmap AND of indexes with redundant
+ * conditions, since it's difficult for the selectivity code to recognize
+ * the redundancy. By clamping the cost estimate this way, we prevent
+ * redundant AND scans from looking cheaper than non-redundant ones.
+ */
+ tuples_fetched = Max(tuples_fetched, baserel->rows);
+
T = (baserel->pages > 1) ? (double) baserel->pages : 1.0;

if (outer_rel != NULL && outer_rel->rows > 1)

I tested this and it fixes this particular example, by preventing the
heap scan part of the plan from looking cheaper than it does with just
one index in use. (The index scan part is of course more expensive
with extra indexes, so possibilities with extra indexes will lose out.)

It'd be nice to imagine that this quick and dirty solution obsoletes
the need for most of the expensive heuristics in choose_bitmap_and,
but I'm afraid it probably does not. baserel->rows might include the
effects of some non-index-related WHERE conditions, so the clamp here
is not tight. Still, it should fix egregious cases like this one,
so I'm inclined to apply it.

> Reproduced on PostgreSQL 8.3.15, 8.4.8, 9.0.4, 9.1rc1 and 9.2devel.
> However, this issue does NOT occur on 8.2.21

8.2 doesn't recognize the partial index as applicable (for lack of
enough understanding of cross-type operator relationships), so it
doesn't reach the problematic decision.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Colson 2011-09-05 18:02:46 Re: regular logging of checkpoint progress
Previous Message Andy Colson 2011-09-05 17:56:04 savepoint commit performance