Re: Re: Indexes with condition using immutable functions applied to column not used

From: Nick Lello <nick(dot)lello(at)rentrakmail(dot)com>
To: sylvain(at)abstraction(dot)fr
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: Indexes with condition using immutable functions applied to column not used
Date: 2011-02-09 15:34:01
Message-ID: AANLkTinv5FjKwP=57xu0AuzJxCMzRxEtd7sTUO-PuTj3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Should there be a Rule for Select to cause partitions to be excluded ?

On 8 February 2011 20:08, Sylvain Rabot <sylvain(at)abstraction(dot)fr> wrote:

> I also tried to do table partitioning using the same immutable function,
> it works well except for constraint exclusion.
>
> CREATE TABLE mike.directory_part_0 () INHERITS (mike.directory) WITH
> (fillfactor = 90);
> CREATE RULE directory_part_0_insert AS ON INSERT TO mike.directory WHERE
> (__mod_cons_hash(new.id_user::bigint, 2) = 0)
> DO INSTEAD INSERT INTO mike.directory_part_0 VALUES (new.*);
>
> CREATE TABLE mike.directory_part_1 () INHERITS (mike.directory) WITH
> (fillfactor = 90);
> CREATE RULE directory_part_1_insert AS ON INSERT TO mike.directory WHERE
> (__mod_cons_hash(new.id_user::bigint, 2) = 1)
> DO INSTEAD INSERT INTO mike.directory_part_1 VALUES (new.*);
>
> mike_part=# explain analyze select * from directory where id_user = 3;
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Result (cost=0.00..310.21 rows=5226 width=141) (actual time=0.080..7.583
> rows=2653 loops=1)
> -> Append (cost=0.00..310.21 rows=5226 width=141) (actual
> time=0.077..3.654 rows=2653 loops=1)
> -> Index Scan using directory_id_user_btree_idx on directory
> (cost=0.00..8.27 rows=1 width=141) (actual time=0.007..0.007 rows=0
> loops=1)
> Index Cond: (id_user = 3)
> -> Index Scan using directory_part_0_id_user_btree_idx on
> directory_part_0 directory (cost=0.00..8.27 rows=1 width=150) (actual
> time=0.035..0.035 rows=0 loops=1)
> Index Cond: (id_user = 3)
> -> Index Scan using directory_part_1_id_user_btree_idx on
> directory_part_1 directory (cost=0.00..293.67 rows=5224 width=141) (actual
> time=0.035..2.037 rows=2653 loops=1)
> Index Cond: (id_user = 3)
> Total runtime: 8.807 ms
> (9 rows)
>
>
> On Tue, 2011-02-08 at 01:14 +0100, Sylvain Rabot wrote:
> > Hi,
> >
> > I am trying to understand how indexes works to get the most of them.
> >
> > First I would like to know if there is more advantage than overhead to
> > split an index in several ones using conditions e.g. doing :
> >
> > CREATE INDEX directory_id_user_0_btree_idx ON mike.directory USING btree
> (id_user) WHERE id_user < 250000;
> > CREATE INDEX directory_id_user_250000_btree_idx ON mike.directory USING
> btree (id_user) WHERE id_user >= 250000 AND id_user < 500000;
> > CREATE INDEX directory_id_user_500000_btree_idx ON mike.directory USING
> btree (id_user) WHERE id_user >= 500000 AND id_user < 750000;
> > CREATE INDEX directory_id_user_750000_btree_idx ON mike.directory USING
> btree (id_user) WHERE id_user >= 750000 AND id_user < 1000000;
> >
> > instead of having only one index for all the id_user. the forecasts for
> > the table directory are +500 millions records and something like 1
> > million distinct id_user.
> >
> > If there is my idea was to do a repartition in the indexes using a
> > consistent hash algorithm in order to fill the indexes in parallel
> > instead of successively :
> >
> > CREATE OR REPLACE FUNCTION mike.__mod_cons_hash(
> > IN in_dividend bigint,
> > IN in_divisor integer,
> > OUT remainder integer
> > ) AS $__$
> >
> > BEGIN
> > SELECT in_dividend % in_divisor INTO remainder;
> > END;
> >
> > $__$ LANGUAGE plpgsql IMMUTABLE COST 10;
> >
> > CREATE INDEX directory_id_user_mod_cons_hash_0_btree_idx ON
> mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 0;
> > CREATE INDEX directory_id_user_mod_cons_hash_1_btree_idx ON
> mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 1;
> > CREATE INDEX directory_id_user_mod_cons_hash_2_btree_idx ON
> mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 2;
> > CREATE INDEX directory_id_user_mod_cons_hash_3_btree_idx ON
> mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 3;
> >
> > But the thing is the indexes are not used :
> >
> > mike=# SELECT version();
> > version
> >
> -------------------------------------------------------------------------------------------------------------------
> > PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real
> (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit
> > (1 row)
> >
> > mike=# REINDEX INDEX directory_id_user_mod_cons_hash_0_btree_idx;
> > LOG: duration: 14644.160 ms statement: REINDEX INDEX
> > directory_id_user_mod_cons_hash_0_btree_idx;
> > REINDEX
> > mike=# EXPLAIN ANALYZE SELECT * FROM directory WHERE id_user = 4;
> > QUERY PLAN
> >
> ----------------------------------------------------------------------------------------------------------------
> > Seq Scan on directory (cost=0.00..38140.66 rows=67 width=148) (actual
> time=0.077..348.211 rows=10303 loops=1)
> > Filter: (id_user = 4)
> > Total runtime: 351.114 ms
> > (3 rows)
> >
> > So I also did this test :
> >
> > mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING
> btree (id_user) WHERE id_user > 3 and id_user < 5;
> > CREATE INDEX
> > mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
> > QUERY
> PLAN
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------
> > Index Scan using directory_id_user_4_btree_idx on directory
> (cost=0.00..10.58 rows=67 width=148) (actual time=0.169..7.753 rows=10303
> loops=1)
> > Index Cond: (id_user = 4)
> > Total runtime: 10.973 ms
> > (3 rows)
> >
> > mike=# DROP INDEX directory_id_user_4_btree_idx;
> > DROP INDEX
> > mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING
> btree (id_user) WHERE id_user - 1 > 2 and id_user + 1 < 6;
> > CREATE INDEX
> > mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
> > QUERY PLAN
> >
> ----------------------------------------------------------------------------------------------------------------
> > Seq Scan on directory (cost=0.00..38140.66 rows=67 width=148) (actual
> time=0.153..360.020 rows=10303 loops=1)
> > Filter: (id_user = 4)
> > Total runtime: 363.106 ms
> > (3 rows)
> >
> > mike=# DROP INDEX directory_id_user_4_btree_idx;
> > DROP INDEX
> > mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING
> btree (id_user) WHERE id_user > 2 + 1 and id_user < 6 - 1;
> > CREATE INDEX
> > mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
> > QUERY
> PLAN
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------
> > Index Scan using directory_id_user_4_btree_idx on directory
> (cost=0.00..10.58 rows=67 width=148) (actual time=0.245..8.262 rows=10303
> loops=1)
> > Index Cond: (id_user = 4)
> > Total runtime: 11.110 ms
> > (3 rows)
> >
> > As you see the index condition although, differently written, is the
> > same but the second index is not used apparently because the immutable
> > function is applied on the column.
> >
> > So do you know the reason why the planner is not able to use indexes
> > which have immutable functions applied to the column in their
> > condition ?
> >
> > Regards.
> >
>
> --
> Sylvain Rabot <sylvain(at)abstraction(dot)fr>
>

--

Nick Lello | Web Architect
o +44 (0) 8433309374 | m +44 (0) 7917 138319
Email: nick.lello at rentrak.com
RENTRAK | www.rentrak.com | NASDAQ: RENT

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Frost 2011-02-09 19:40:08 Re: [PERFORM] pgbench to the MAXINT
Previous Message Vitalii Tymchyshyn 2011-02-09 09:52:44 Re: Bad query plan when the wrong data type is used