Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

From: Markus Schaber <schabi(at)logix-tt(dot)com>
To: pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date: 2005-05-03 13:06:23
Message-ID: 4277774F.7040205@logix-tt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Hi, Josh,

Josh Berkus wrote:

> Yes, actually. We need 3 different estimation methods:
> 1 for tables where we can sample a large % of pages (say, >= 0.1)
> 1 for tables where we sample a small % of pages but are "easily estimated"
> 1 for tables which are not easily estimated by we can't afford to sample a
> large % of pages.
>
> If we're doing sampling-based estimation, I really don't want people to lose
> sight of the fact that page-based random sampling is much less expensive than
> row-based random sampling. We should really be focusing on methods which
> are page-based.

Would it make sense to have a sample method that scans indices? I think
that, at least for tree based indices (btree, gist), rather good
estimates could be derived.

And the presence of a unique index should lead to 100% distinct values
estimation without any scan at all.

Markus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2005-05-03 13:38:19 Re: [pgsql-advocacy] Increased company involvement
Previous Message =?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?= 2005-05-03 13:00:22 Regression tests

Browse pgsql-performance by date

  From Date Subject
Next Message Markus Schaber 2005-05-03 14:40:46 Re: batch inserts are "slow"
Previous Message Chris Browne 2005-05-02 16:16:42 Re: batch inserts are "slow"