Bad row estimation with indexed func returning bool

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Bad row estimation with indexed func returning bool
Date: 2015-09-11 11:33:03
Message-ID: 20150911133303.65895700@erg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I faced a correlation problem on a query today and tried the usual trick
consisting of using an functional index and rewriting the query to use it.

However, after writing the function, indexing it and rewriting the query, I
faced an optimizer behavior I was not expecting. I wrote a short scenario to
reproduce it on current HEAD:

CREATE TABLE correl AS SELECT (i-1)%2 AS i, i%2 AS j
FROM generate_series(1,100000) AS i;

ANALYZE correl ;

EXPLAIN ANALYZE SELECT * FROM correl WHERE i = 1 AND j = 1;

-- Seq Scan on correl (rows=25000) (rows=0)
-- Filter: ((i = 1) AND (j = 1))
-- Rows Removed by Filter: 100000
-- Planning time: 0.356 ms
-- Execution time: 21.937 ms

CREATE FUNCTION fix_correl(int, int) RETURNS bool AS
'BEGIN RETURN $1 = 1 AND $2 = 1; END '
IMMUTABLE
CALLED ON NULL INPUT
LANGUAGE plpgsql;

CREATE INDEX ON correl ( fix_correl(i, j) );

ANALYZE correl ;

EXPLAIN ANALYZE SELECT * FROM correl WHERE fix_correl(i, j);

-- Index Scan using correl_fix_correl_idx on correl (rows=33333) (rows=0)
-- Index Cond: (fix_correl(i, j) = true)
-- Filter: fix_correl(i, j)
-- Planning time: 0.421 ms
-- Execution time: 0.102 ms

Using a function returning integer work as expected:

CREATE FUNCTION fix_correl_add(int, int) RETURNS int AS
'BEGIN RETURN $1 + $2 ; END '
IMMUTABLE
CALLED ON NULL INPUT
LANGUAGE plpgsql;

CREATE INDEX ON correl ( fix_correl_add( i, j ) );

ANALYZE correl ;

EXPLAIN ANALYZE SELECT * FROM correl WHERE fix_correl_add( i, j ) = 2;
-- Index Scan using correl_fix_correl_add_idx on correl (rows=1) (rows=0)
-- Index Cond: (fix_correl_add(i, j) = 2)
-- Planning time: 0.462 ms
-- Execution time: 0.102 ms

It works works as expected with a simple index on (i + j) with no function, but
I wanted to have the same conditions in both tests.

Why does the optimizer behave differently in both cases? Why do it add a Filter
when index scan-ing on correl_fix_correl_idx indexing booleans?

Please, find the complete scenario in attachment.

Regards,
--
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com

Attachment Content-Type Size
correl_bool_scenario.txt text/plain 3.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-09-11 11:33:15 Re: RLS open items are vague and unactionable
Previous Message Pavel Stehule 2015-09-11 10:25:10 Re: proposal: function parse_ident