From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Statistics and selectivity estimation for ranges |
Date: | 2012-08-14 06:45:58 |
Message-ID: | CAPpHfdudkYhJkGbhQtbjt6GG5Owp_v7_kpQKA=1Dz5Fb8iGteA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Aug 13, 2012 at 1:11 AM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:
> On Thu, Aug 9, 2012 at 12:44 AM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:
>
>> My conclusion is so, that current errors are probably ok for selectivity
>> estimation. But taking into attention that generated datasets ideally fits
>> assumptions of estimation, there could be room for improvement. Especially,
>> it's unclear why estimate for "<@" and "@>" have much greater error than
>> estimate for "&&". Possibly, it's caused by some bugs.
>>
>
> ITSM, I found reason of inaccuracy. Implementation of linear interpolation
> was wrong. Fixed version is attached. Now, need to rerun tests, possible
> refactoring and comments rework.
>
After fixing few more bugs, I've a version with much more reasonable
accuracy.
Statistics target = 100.
Relatively large result sets (>= 10)
test=# select operator, avg(estimate_count::float8/actual_count::float8) as
avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) -
1.0 as avg_error from datasets d join test_results tr on tr.test_id =
d.idwhere d.stat_target = 100 and actual_count >= 10 group by
operator;
operator | avg_ratio | avg_error
----------+------------------+--------------------
<@ | 1.00404179116863 | 0.0504415454560903
@> | 1.06364108531688 | 0.105646077989812
&& | 1.00757984721409 | 0.0420984234933233
(3 rows)
Small result sets (1 - 9)
test=# select operator, avg(estimate_count::float8/actual_count::float8) as
avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) -
1.0 as avg_error from datasets d join test_results tr on tr.test_id =
d.idwhere d.stat_target = 100 and actual_count between 1 and 9 group
by
operator;
operator | avg_ratio | avg_error
----------+------------------+-------------------
<@ | 1.31530838062865 | 0.654886592410495
@> | 2.78708078320147 | 1.94124123003433
&& | 1.93268112525538 | 1.09904919063335
(3 rows)
Empty result sets
test=# select operator, avg(estimate_count) as avg_estimate, count(*) as
tests_count from datasets d join test_results tr on tr.test_id = d.id where
d.stat_target = 100 and actual_count = 0 group by operator;
operator | avg_estimate | tests_count
----------+--------------------+-------------
<@ | 1.1437670609645132 | 1099
@> | 1.0479430126460701 | 87458
(2 rows)
Statistics target = 1000.
Relatively large result sets (>= 10)
test=# select operator, avg(estimate_count::float8/actual_count::float8) as
avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) -
1.0 as avg_error from datasets d join test_results tr on tr.test_id =
d.idwhere d.stat_target = 1000 and actual_count >= 10 group by
operator;
operator | avg_ratio | avg_error
----------+------------------+--------------------
<@ | 1.00073999445381 | 0.045099762607524
@> | 1.05296320350853 | 0.0907489633452971
&& | 1.00217602359039 | 0.0353421159150165
(3 rows)
Small result sets (1 - 9)
test=# select operator, avg(estimate_count::float8/actual_count::float8) as
avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) -
1.0 as avg_error from datasets d join test_results tr on tr.test_id =
d.idwhere d.stat_target = 1000 and actual_count between 1 and 9 group
by
operator;
operator | avg_ratio | avg_error
----------+------------------+-------------------
<@ | 1.26946358795998 | 0.577803898836364
@> | 2.69000633430211 | 1.83165424646645
&& | 1.48715184186882 | 0.577998652291105
(3 rows)
Empty result sets
test=# select operator, avg(estimate_count) as avg_estimate, count(*) as
tests_count from datasets d join test_results tr on tr.test_id = d.id where
d.stat_target = 1000 and actual_count = 0 group by operator;
operator | avg_estimate | tests_count
----------+--------------------+-------------
<@ | 1.0887096774193548 | 1364
@> | 1.0423876983771183 | 89224
&& | 5.0000000000000000 | 1
(3 rows)
------
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
range_stat-0.6.patch.gz | application/x-gzip | 10.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2012-08-14 07:46:04 | patch: shared session variables |
Previous Message | Kevin Grittner | 2012-08-14 05:27:20 | Re: default_isolation_level='serializable' crashes on Windows |