statistics

From: Tiago Antão <tra(at)fct(dot)unl(dot)pt>
To: PostgreSQL Hackers list <pgsql-hackers(at)hub(dot)org>
Subject: statistics
Date: 2000-08-23 20:13:55
Message-ID: Pine.LNX.4.21.0008232034030.6540-100000@eros.si.fct.unl.pt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

I've made a small (2 hours work) program to make "histograms" on data.
Giving a table and column he tries to put in buckets special values
(special means the most/least used values). It tries to be a little smart,
ie, if there are lots of guys with most/least values it will not put them
in buckets.

Example:
bucket size=10
$ ./a.out "dbname=teste" d_pags uid
Distinct values: 1028
Number of tuples: 6880
On a uniform distribution: 6.692607 tuples/value

# of values with more references
1 - 1
2 - 1
3 - 1
[...]
This means that there is only one value as the most referenced (110 times)
# of values with less references
1 - 253
2 - 153
3 - 109
[..]
This means that there ara 253 values that have the least references (once)
Best case buckets
1 - u805156 (110)
2 - u1503927 (103)
3 - u110525 (82)
4 - u91106009 (78)
5 - u1106837 (60)
6 - u1714112 (55)
7 - u1414335 (53)
8 - u1105732 (50)
9 - u302719 (49)
Worst case buckets
[there are so many guys with only one ref... nothing can be put in
buckets]
Removed values: 9
Removed tuples: 640
Expected tuples for each of unclassified values: 6.123651 tuples/value

If all values have equal prob of being chosen:
Error in normal case: 5.464981
Error in hist case: 4.905326

If my calculations are right:
For 9 of 1028 values (9.5% of relation) there is a very precise idea,
but in general (for a random value being selected) that's not a great
advance. At least for this case ...

Tiago

Browse pgsql-hackers by date

  From Date Subject
Next Message Vince Vielhaber 2000-08-23 20:47:47 How do pronounce PostgreSQL - the final word.
Previous Message Ross J. Reedstrom 2000-08-23 17:55:18 Re: [HACKERS] when does CREATE VIEW not create a view?