Citation for "Bad n_distinct estimation; hacks suggested?"

From: Gurmeet Manku <manku(at)CS(dot)Stanford(dot)EDU>
To: Gurmeet Manku <manku(at)xenon(dot)Stanford(dot)EDU>
Cc: pgsql-perform <pgsql-performance(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Citation for "Bad n_distinct estimation; hacks suggested?"
Date: 2005-05-02 16:14:00
Message-ID: Pine.LNX.4.44.0505020858580.28631-100000@xenon.Stanford.EDU
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


Actually, the earliest paper that solves the distinct_n estimation
problem in 1 pass is the following:

"Estimating simple functions on the union of data streams"
by Gibbons and Tirthapura, SPAA 2001.
http://home.eng.iastate.edu/~snt/research/streaming.pdf

The above paper addresses a more difficult problem (1 pass
_and_ a distributed setting).

Gibbon's followup paper in VLDB 2001 limits the problem to a
single machine and contains primarily experimental results (for
a database audience). The algorithmic breakthrough had already been
accomplished in the SPAA paper.

Gurmeet

--
----------------------------------------------------
Gurmeet Singh Manku Google Inc.
http://www.cs.stanford.edu/~manku (650) 967 1890
----------------------------------------------------

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2005-05-02 16:48:07 Re: [HACKERS] Increased company involvement
Previous Message Heikki Linnakangas 2005-05-02 15:47:14 Re: Feature freeze date for 8.1

Browse pgsql-performance by date

  From Date Subject
Next Message Chris Browne 2005-05-02 16:16:42 Re: batch inserts are "slow"
Previous Message David Parker 2005-05-02 15:53:33 Re: batch inserts are "slow"