Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, Greg Stark <gsstark(at)mit(dot)edu>, Marko Ristola <marko(dot)ristola(at)kolumbus(dot)fi>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date: 2005-04-26 21:41:20
Message-ID: 426EB580.9040606@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Simon Riggs wrote:

>The comment
> * Every value in the sample appeared more than once. Assume
> * the column has just these values.
>doesn't seem to apply when using larger samples, as Josh is using.
>
>Looking at Josh's application it does seem likely that when taking a
>sample, all site visitors clicked more than once during their session,
>especially if they include home page, adverts, images etc for each page.
>
>Could it be that we have overlooked this simple explanation and that the
>Haas and Stokes equation is actually quite good, but just not being
>applied?
>
>
>
>

No, it is being aplied. If every value in the sample appears more than
once, then f1 in the formula is 0, and the result is then just d, the
number of distinct values in the sample.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rob Butler 2005-04-26 21:43:27 Re: DO INSTEAD and conditional rules
Previous Message Simon Riggs 2005-04-26 21:02:31 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

Browse pgsql-performance by date

  From Date Subject
Next Message John A Meinel 2005-04-26 21:51:09 Re: speed up query with max() and odd estimates
Previous Message Simon Riggs 2005-04-26 21:02:31 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?