Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-perform <pgsql-performance(at)postgresql(dot)org>,pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date: 2005-04-25 19:13:18
Message-ID: 200504251213.18565.josh@agliodbs.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Simon, Tom:

While it's not possible to get accurate estimates from a fixed size sample, I 
think it would be possible from a small but scalable sample: say, 0.1% of all 
data pages on large tables, up to the limit of maintenance_work_mem.  

Setting up these samples as a % of data pages, rather than a pure random sort, 
makes this more feasable; for example, a 70GB table would only need to sample 
about 9000 data pages (or 70MB).  Of course, larger samples would lead to 
better accuracy, and this could be set through a revised GUC (i.e., 
maximum_sample_size, minimum_sample_size).   

I just need a little help doing the math ... please?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

pgsql-performance by date

Next:From: Josh BerkusDate: 2005-04-25 19:18:26
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Previous:From: Simon RiggsDate: 2005-04-25 18:49:01
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

pgsql-hackers by date

Next:From: Josh BerkusDate: 2005-04-25 19:18:26
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Previous:From: Simon RiggsDate: 2005-04-25 18:49:01
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group