Skip site navigation (1) Skip section navigation (2)

Bad n_distinct estimation; hacks suggested?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "pgsql-perform" <pgsql-performance(at)postgresql(dot)org>
Subject: Bad n_distinct estimation; hacks suggested?
Date: 2005-04-19 19:09:05
Message-ID: 200504191209.05181.josh@agliodbs.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Folks,

Params:  PostgreSQL 8.0.1 on Solaris 10
Statistics = 500
(tablenames have been changed to protect NDA)

e1=# select tablename, null_frac, correlation, n_distinct from pg_stats where 
tablename = 'clickstream1' andattname = 'session_id';
      tablename       | null_frac | correlation | n_distinct
----------------------+-----------+-------------+------------
 clickstream1         |         0 |    0.412034 |     378174
(2 rows)

e1=# select count(distinct session_id) from clickstream1;
  count
---------
 3174813

As you can see, n_distinct estimation is off by a factor of 10x and it's 
causing query planning problems.   Any suggested hacks to improve the 
histogram on this?

(BTW, increasing the stats to 1000 only doubles n_distinct, and doesn't solve 
the problem)

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Responses

pgsql-performance by date

Next:From: Dave HeldDate: 2005-04-19 20:01:57
Subject: Re: Bad n_distinct estimation; hacks suggested?
Previous:From: Tom LaneDate: 2005-04-19 18:34:35
Subject: Re: Question on REINDEX

pgsql-hackers by date

Next:From: Matthew T. O'ConnorDate: 2005-04-19 19:44:15
Subject: Re: Win32 presentation
Previous:From: Victor Y. YegorovDate: 2005-04-19 19:03:19
Subject: Re: Comparing Datum's at aminsert() stage

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group