Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)
Date: 2011-02-17 21:24:52
Message-ID: 4D5D9224.7080105@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All.

The NULL element always suprises me in unpleasant ways.. my brain simply
cant really understand the logic, so please let me know if this is
one of the cases where I just should spend way more efforts into fixing
that instead.

I have a table with a "null_frac" of 0.5 and i have tested that a where
clause that evaluates to null isnt included in the result:

testdb=# select id from testtable where null @@ to_tsquery('testterm80');
id
----
(0 rows)

Then I'd expect to have the null_fraq taken into account when computing the
estimates for the query:

testdb=# explain select id from testtable where fts @@
to_tsquery('testterm80');
QUERY PLAN
---------------------------------------------------------------
Seq Scan on testtable (cost=0.00..1985.03 rows=1966 width=4)
Filter: (fts @@ to_tsquery('testterm80'::text))
(2 rows)

Whereas it actually does it if I explicitly add the "fts is not null"
clause to the query.

testdb=# explain select id from testtable where fts @@
to_tsquery('testterm80') and fts is not null;
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on testtable (cost=130.34..1735.19 rows=983 width=4)
Recheck Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts IS
NOT NULL))
-> Bitmap Index Scan on testtable_fts_idx (cost=0.00..130.09
rows=983 width=0)
Index Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts
IS NOT NULL))
(4 rows)

When something evaluates to "null" isn't included in the result,
shouldn't the query-planner
then take the null_frac into account when computing the estimate?

Trying to do the same thing using integers and the < operator seem to
take the null_frac into
account.

Below snippet allows to reproduce the dataset.

create table testtable (id serial primary key, document text, fts tsvector);
create index on testtable using gist(fts);
CREATE OR REPLACE FUNCTION public.filltable(rows integer)
RETURNS boolean
LANGUAGE plpgsql
AS $function$
DECLARE
count integer;
BEGIN
count := 0;
LOOP
EXIT WHEN count = rows;
count := count +1;
insert into testtable(document,fts) select
document,to_tsvector('english',document) from (select
string_agg(concat,' ') as document from (select concat('testterm' ||
generate_series(1,floor(random()*100)::integer))) as foo) as bar;
END LOOP;
RETURN TRUE;
END;
$function$

select filltable(10000);
testdb=# update testtable set fts = null where id % 2 = 0;
UPDATE 5001
testdb=# ANALYZE verbose testtable;
INFO: analyzing "public.testtable"
INFO: "testtable": scanned 1835 of 1835 pages, containing 10002 live
rows and 5001 dead rows; 10002 rows in sample, 10002 estimated total rows
ANALYZE
testdb=# select null_frac from pg_stats where attname = 'fts';
null_frac
-----------
0.5
(1 row)

... trying with integers:

testdb=# ALTER TABLE testtable add column testint integer;
ALTER TABLE
testdb=# update testtable set testint = floor(random()*100);
UPDATE 10002
testdb=# ANALYZE verbose testtable;
INFO: analyzing "public.testtable"
INFO: "testtable": scanned 2186 of 2186 pages, containing 10002 live
rows and 10002 dead rows; 10002 rows in sample, 10002 estimated total rows
ANALYZE
testdb=# update testtable set testint = null where id %2 = 0;
UPDATE 5001
testdb=# ANALYZE verbose testtable;
INFO: analyzing "public.testtable"
INFO: "testtable": scanned 2282 of 2282 pages, containing 10002 live
rows and 13335 dead rows; 10002 rows in sample, 10002 estimated total rows
analyzze ANALYZE
testdb=# explain select id from testtable where testint = 50;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on testtable (cost=0.00..2407.03 rows=64 width=4)
Filter: (testint = 50)
(2 rows)

testdb=# explain select id from testtable where testint = 1;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on testtable (cost=0.00..2407.03 rows=48 width=4)
Filter: (testint = 1)
(2 rows)

testdb=# explain select id from testtable where testint < 50;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on testtable (cost=0.00..2407.03 rows=2470 width=4)
Filter: (testint < 50)
(2 rows)

(found on 8.4 and reproduced on git-head)

Attached patch tries to align the behaviour

Thanks.

--
Jesper

Attachment Content-Type Size
0001-Take-null_frac-into-account-in-calculating-selectivi.patch text/x-patch 0 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-02-17 21:42:32 Re: Replication server timeout patch
Previous Message Simon Riggs 2011-02-17 21:21:23 Re: Replication server timeout patch