Re: On Distributions In 7.2.1

From: Mark kirkwood <markir(at)slingshot(dot)co(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: On Distributions In 7.2.1
Date: 2002-05-02 09:36:37
Message-ID: 1020332199.3822.16.camel@spikey.slithery.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

err - some of my puzzlement is explained by this (silly) error :

there are 3000 distinct values, each with 1000 occurrences (*not* 3000)
so the frequency I am "chasing" is 0.00033... ( not 0.001)

so for 10 (quantiles) we see ~ 0.0016 (too big)
for 30 seeing ~ 0.001 (too big but closer)
for 100 seeing ~ 0.00066 (still too big but closer)

so actually its "creeping" closer (rather than oscillating), which seems
a much healthier situation.

However Tom's observation is still valid (in spite of my math) - all the
frequencies are overestimated, rather than the expected "some bigger,
some smaller" sort of thing.

I will do some more ANALYZE runs and see what happens...

(then the log distribution could be fun)

regards

Mark

>On Thu, 2002-05-02 at 17:00, Tom Lane wrote:
> Mark kirkwood <markir(at)slingshot(dot)co(dot)nz> writes:
> > There is slightly odd behaviour with the frequencies decreasing with
> > increasing number of quantiles (same as 7.2 .. same code here ?).
>
> That does seem curious. With the inevitable sampling error, you'd
> expect that some values would be sampled at a bit more than their
> true frequency, and others at a bit less. The oversampled ones would
> be the ones to get into the MCV list. But what you've got here is
> that even the most-commonly-sampled value showed up at a bit less
> than its true frequency. Is this repeatable if you do ANALYZE over
> and over? Maybe it was just a statistical fluke.
>
> > I am wondering if this is caused by my example not having any "real" most
> > common values (they are all as common as each other).
> > I am going to fiddle with my data generation script, skew the
> > distribution and see what effect that has.
>
> Someone else reported some results that made it look like a logarithmic
> frequency distribution was a difficult case for the stats gatherer:
> http://archives.postgresql.org/pgsql-general/2002-03/msg01300.php
> So please be sure to try that.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jean-Michel POURE 2002-05-02 09:58:53 Re: HOWTO - Random character generation for primary key
Previous Message Vince Vielhaber 2002-05-02 09:34:19 Re: Fwd: Postfix Relay Hub SMTP server: errors from