More On 7.2 Distributions - Estimates For Number Distinct < 0

From: Mark kirkwood <markir(at)slingshot(dot)co(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: More On 7.2 Distributions - Estimates For Number Distinct < 0
Date: 2001-10-29 09:18:38
Message-ID: 01102922183801.04563@spikey.slithery.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In the process of attempting to understand the data in pg_stats, I created a
(very) simple example :

CREATE TABLE test(id integer);

INSERT INTO test VALUES(1);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(2);
INSERT INTO test VALUES(2);
INSERT INTO test VALUES(2);
INSERT INTO test VALUES(3);
INSERT INTO test VALUES(4);
INSERT INTO test VALUES(5);

ANALYZE test;

SELECT * FROM pg_stats WHERE tablename='test';

tablename test
attname id
null_frac 0
avg_width 4
n_distinct -0.5
most_common_vals {"1","2"}
most_common_vals {"0.4","0.3"}
histogram_bounds {"3","4","5"}
correlation 1

everything looks good except for n_distinct ( its negative - should be 5)
(I wasn't too worried about avg_width )

Using fairly crude tracing (adding elog calls) in
src/backend/commands/analyze.c :

DEBUG: Analyzing test
DEBUG: Analyze : beginning a column
DEBUG: Have 10 total values in relation (totalrows)
DEBUG: Have 10 values in relation (numrows)
DEBUG: Have 10 values in sample (values_cnt)
DEBUG: Have 5 distinct values in sample (ndistinct)
DEBUG: Have 2 multiple values in sample (nmultiple)
DEBUG: calc 5.000000 distinct via Chaudhuri rule
DEBUG: calc -0.500000 distinct via >10 percent rowcount rule

So we had the correct answer before applying the 10 percent rowcount code.

This 10 percent rowcount code being line 1340 or thereabouts :

if (stats->stadistinct > 0.1 * totalrows)
{
stats->stadistinct = -(stats->stadistinct / totalrows);
}

My example is pretty contrived, but I wonder if I have "stumbled" on a bug
here.

regards

Mark

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andy Hallam 2001-10-29 09:58:04 SELECT with backslash '\' character
Previous Message Mark kirkwood 2001-10-29 08:53:56 Re: On Distributions In 7.2