Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Marko Ristola <marko(dot)ristola(at)kolumbus(dot)fi>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date: 2005-04-24 19:08:15
Message-ID: 200504241208.15437.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Folks,

> I wonder if this paper has anything that might help:
> http://www.stat.washington.edu/www/research/reports/1999/tr355.ps - if I
> were more of a statistician I might be able to answer :-)

Actually, that paper looks *really* promising. Does anyone here have enough
math to solve for D(sub)Md on page 6? I'd like to test it on samples of <
0.01%.

Tom, how does our heuristic sampling work? Is it pure random sampling, or
page sampling?

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2005-04-24 19:15:37 Re: [WIP] shared locks
Previous Message Josh Berkus 2005-04-24 19:06:05 Re: W[i/e]rd performance issue with 8.1cvs

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2005-04-24 22:01:46 Re: Sort and index
Previous Message Josh Berkus 2005-04-24 18:30:50 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?