Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Gurmeet Manku <manku(at)CS(dot)Stanford(dot)EDU>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, Greg Stark <gsstark(at)mit(dot)edu>, Marko Ristola <marko(dot)ristola(at)kolumbus(dot)fi>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org, Utkarsh Srivastava <usriv(at)xenon(dot)Stanford(dot)EDU>, snt(at)iastate(dot)edu
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date: 2005-04-27 07:45:10
Message-ID: 1114587910.21529.394.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, 2005-04-26 at 15:00 -0700, Gurmeet Manku wrote:

> 2. In a single scan, it is possible to estimate n_distinct by using
> a very simple algorithm:
>
> "Distinct sampling for highly-accurate answers to distinct value
> queries and event reports" by Gibbons, VLDB 2001.
>
> http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf

That looks like the one...

...though it looks like some more complex changes to the current
algorithm to use it, and we want the other stats as well...

> 3. In fact, Gibbon's basic idea has been extended to "sliding windows"
> (this extension is useful in streaming systems like Aurora / Stream):
>
> "Distributed streams algorithms for sliding windows"
> by Gibbons and Tirthapura, SPAA 2002.
>
> http://home.eng.iastate.edu/~snt/research/tocs.pdf
>

...and this offers the possibility of calculating statistics at load
time, as part of the COPY command

Best Regards, Simon Riggs

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2005-04-27 12:02:21 Re: [HACKERS] Continue transactions after errors in psql
Previous Message Pavel Stehule 2005-04-27 06:27:02 Re: bitmapscan test, no success, bs is not faster

Browse pgsql-performance by date

  From Date Subject
Next Message Yann Michel 2005-04-27 13:31:39 Re: What needs to be done for real Partitioning?
Previous Message Greg Stark 2005-04-27 05:59:30 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?