Re: Statistics and selectivity estimation for ranges

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Statistics and selectivity estimation for ranges
Date: 2012-12-10 19:21:44
Message-ID: 1355167304.3896.37.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It looks like there are still some problems with this patch.

CREATE TABLE foo(ir int4range);
insert into foo select 'empty' from generate_series(1,10000);
insert into foo select int4range(NULL, g, '(]')
from generate_series(1,1000000) g;
insert into foo select int4range(g, NULL, '[)')
from generate_series(1,1000000) g;
insert into foo select int4range(g, ((g*1.01)+10)::int4, '[]')
from generate_series(1,1000000) g;
CREATE TABLE bar(ir) AS select * from foo order by random();
ANALYZE bar;

Now:
EXPLAIN ANALYZE SELECT * FROM bar
WHERE ir @> int4range(10000,20000);

The estimates are "-nan". Similar for many other queries.

And I have a few other questions/comments:

* Why is "summ" spelled with two "m"s? Is it short for "summation"? If
so, might be good to use "summation of" instead of "integrate" in the
comment.

* Why does get_length_hist_frac return 0.0 when i is the last value? Is
that a mistake?

* I am still confused by the distinction between rbound_bsearch and
rbound_bsearch_bin. What is the intuitive purpose of each?

* You use "constant value" in the comments in several places. Would
"query value" or "search key" be better?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-12-10 19:29:03 Re: The tarball's README has bad install instructions
Previous Message Karl O. Pinc 2012-12-10 19:19:44 The tarball's README has bad install instructions