Indexes with condition using immutable functions applied to column not used

From: Sylvain Rabot <sylvain(at)abstraction(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Cc: Sylvain Rabot <sylvain(at)abstraction(dot)fr>
Subject: Indexes with condition using immutable functions applied to column not used
Date: 2011-02-08 00:14:58
Message-ID: 1297124098.2549.65.camel@kheops
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I am trying to understand how indexes works to get the most of them.

First I would like to know if there is more advantage than overhead to
split an index in several ones using conditions e.g. doing :

CREATE INDEX directory_id_user_0_btree_idx ON mike.directory USING btree (id_user) WHERE id_user < 250000;
CREATE INDEX directory_id_user_250000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 250000 AND id_user < 500000;
CREATE INDEX directory_id_user_500000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 500000 AND id_user < 750000;
CREATE INDEX directory_id_user_750000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 750000 AND id_user < 1000000;

instead of having only one index for all the id_user. the forecasts for
the table directory are +500 millions records and something like 1
million distinct id_user.

If there is my idea was to do a repartition in the indexes using a
consistent hash algorithm in order to fill the indexes in parallel
instead of successively :

CREATE OR REPLACE FUNCTION mike.__mod_cons_hash(
IN in_dividend bigint,
IN in_divisor integer,
OUT remainder integer
) AS $__$

BEGIN
SELECT in_dividend % in_divisor INTO remainder;
END;

$__$ LANGUAGE plpgsql IMMUTABLE COST 10;

CREATE INDEX directory_id_user_mod_cons_hash_0_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 0;
CREATE INDEX directory_id_user_mod_cons_hash_1_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 1;
CREATE INDEX directory_id_user_mod_cons_hash_2_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 2;
CREATE INDEX directory_id_user_mod_cons_hash_3_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 3;

But the thing is the indexes are not used :

mike=# SELECT version();
version
-------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit
(1 row)

mike=# REINDEX INDEX directory_id_user_mod_cons_hash_0_btree_idx;
LOG: duration: 14644.160 ms statement: REINDEX INDEX
directory_id_user_mod_cons_hash_0_btree_idx;
REINDEX
mike=# EXPLAIN ANALYZE SELECT * FROM directory WHERE id_user = 4;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on directory (cost=0.00..38140.66 rows=67 width=148) (actual time=0.077..348.211 rows=10303 loops=1)
Filter: (id_user = 4)
Total runtime: 351.114 ms
(3 rows)

So I also did this test :

mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user > 3 and id_user < 5;
CREATE INDEX
mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using directory_id_user_4_btree_idx on directory (cost=0.00..10.58 rows=67 width=148) (actual time=0.169..7.753 rows=10303 loops=1)
Index Cond: (id_user = 4)
Total runtime: 10.973 ms
(3 rows)

mike=# DROP INDEX directory_id_user_4_btree_idx;
DROP INDEX
mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user - 1 > 2 and id_user + 1 < 6;
CREATE INDEX
mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on directory (cost=0.00..38140.66 rows=67 width=148) (actual time=0.153..360.020 rows=10303 loops=1)
Filter: (id_user = 4)
Total runtime: 363.106 ms
(3 rows)

mike=# DROP INDEX directory_id_user_4_btree_idx;
DROP INDEX
mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user > 2 + 1 and id_user < 6 - 1;
CREATE INDEX
mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using directory_id_user_4_btree_idx on directory (cost=0.00..10.58 rows=67 width=148) (actual time=0.245..8.262 rows=10303 loops=1)
Index Cond: (id_user = 4)
Total runtime: 11.110 ms
(3 rows)

As you see the index condition although, differently written, is the
same but the second index is not used apparently because the immutable
function is applied on the column.

So do you know the reason why the planner is not able to use indexes
which have immutable functions applied to the column in their
condition ?

Regards.

--
Sylvain Rabot <sylvain(at)abstraction(dot)fr>

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2011-02-08 01:39:45 Re: Write-heavy pg_stats_collector on mostly idle server
Previous Message Craig Ringer 2011-02-07 23:55:56 Re: Really really slow select count(*)