Re: Bad n_distinct estimation; hacks suggested?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Marko Ristola <marko(dot)ristola(at)kolumbus(dot)fi>
Cc: pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Bad n_distinct estimation; hacks suggested?
Date: 2005-04-22 20:36:08
Message-ID: 200504221336.08325.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


> > Solaris is unknown to me. Maybe the used random number generator there
> > isn't good enough?
>
> Hmmm. Good point. Will have to test on Linux.

Nope:

Linux 2.4.20:

test=# select tablename, attname, n_distinct from pg_stats where tablename =
'web_site_activity_fa';
tablename | attname | n_distinct
----------------------+---------------------+------------
web_site_activity_fa | session_id | 626127

test=# select count(distinct session_id) from web_site_activity_fa;
count
---------
3174813
(1 row)

... I think the problem is in our heuristic sampling code. I'm not the first
person to have this kind of a problem. Will be following up with tests ...

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2005-04-22 20:46:40 Re: Bitmap scans vs. the statistics views
Previous Message Jan Wieck 2005-04-22 20:35:38 Re: Bitmap scans vs. the statistics views

Browse pgsql-performance by date

  From Date Subject
Next Message Mischa Sandberg 2005-04-22 20:53:50 Re: Joel's Performance Issues WAS : Opteron vs Xeon
Previous Message Josh Berkus 2005-04-22 18:52:51 Re: Bad n_distinct estimation; hacks suggested?