Re: Odd statistics behaviour in 7.2

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Gordon A(dot) Runkle" <gar(at)integrated-dynamics(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Odd statistics behaviour in 7.2
Date: 2002-02-13 21:34:27
Message-ID: 782.1013636067@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Gordon A. Runkle" <gar(at)integrated-dynamics(dot)com> writes:
> Would it be fair to say that the correct workaround for now would
> be to use ALTER TABLE SET STATISTICS on columns of interest which have
> this near-unique characteristic?

Yeah, that's probably the best we can do until we can think of a better
estimation equation.

> Does ALTER TABLE SET STATISTICS only increase the histogram size, or
> does it also cause more rows to be sampled?

Both. The Chaudhuri paper I referred to has some math purporting to
prove that the required sample size is directly proportional to the
histogram size, for fixed relative error in the histogram boundaries.
So I made the same parameter control both.

Actually the sample size is driven by the largest SET STATISTICS value
for any column of the table. So you can pick which one you think a
larger histogram would be most useful for; it doesn't have to be the
same column that's got the bad-number-of-distinct-values problem.
Which columns, if any, do you do range queries on? Those would be the
ones where a bigger histogram would be useful.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-02-13 21:53:48 Re: NAMEDATALEN Changes
Previous Message Gordon A. Runkle 2002-02-13 21:10:04 Re: Odd statistics behaviour in 7.2