Boolean column in multicolumn index

From: Dima Pavlov <imyfess(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Boolean column in multicolumn index
Date: 2016-12-10 11:41:48
Message-ID: CAHt_Luuao4gd6De61GryK=2ff-MTgHzjqffdjz02uSdVqYmKKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Test table and indexes:

------------------------

CREATE TABLE public.t (id serial, cb boolean, ci integer, co integer)

INSERT INTO t(cb, ci, co)
SELECT ((round(random()*1))::int)::boolean, round(random()*100),
round(random()*100)
FROM generate_series(1, 1000000)

CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
CREATE INDEX wrong ON public.t USING btree (ci, co);
CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);

-------------------------

The problem is that I can't force PostgreSQL to use the "right" index. The
next query uses the "wrong" index. It's not optimal because it uses
"Filter" (condition: cb = TRUE) and so reads more data from memory (and
execution becomes longer):

explain (analyze, buffers)
SELECT * FROM t WHERE cb = TRUE AND ci = 46 ORDER BY co LIMIT 1000

"Limit (cost=0.42..4063.87 rows=1000 width=13) (actual time=0.057..4.405
rows=1000 loops=1)"
" Buffers: shared hit=1960"
" -> Index Scan using wrong on t (cost=0.42..21784.57 rows=5361
width=13) (actual time=0.055..4.256 rows=1000 loops=1)"
" Index Cond: (ci = 46)"
" Filter: cb"
" Rows Removed by Filter: 967"
" Buffers: shared hit=1960"
"Planning time: 0.318 ms"
"Execution time: 4.530 ms"

------------------------------

But when I cast bool column to int, that works fine. This is unclear,
because selectivity of both indexes (right and right_hack) remains the same.

explain (analyze, buffers)
SELECT * FROM t WHERE cb::int = 1 AND ci = 46 ORDER BY co LIMIT 1000

"Limit (cost=0.42..2709.91 rows=1000 width=13) (actual time=0.027..1.484
rows=1000 loops=1)"
" Buffers: shared hit=1003"
" -> Index Scan using right_hack on t (cost=0.42..14525.95 rows=5361
width=13) (actual time=0.025..1.391 rows=1000 loops=1)"
" Index Cond: ((ci = 46) AND ((cb)::integer = 1))"
" Buffers: shared hit=1003"
"Planning time: 0.202 ms"
"Execution time: 1.565 ms"

-----------------------------

Are there any limitations of using boolean column inside multicolumn index?

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2016-12-11 00:04:39 Re: Boolean column in multicolumn index
Previous Message DrakoRod 2016-12-08 03:49:00 Trigger before or after update that change row to another child table of a partitioned table