Skip site navigation (1) Skip section navigation (2)

Re: best statistic target for boolean columns

From: Gregory Stark <gsstark(at)mit(dot)edu>
To: PostgreSQL Performance List <pgsql-performance(at)postgresql(dot)org>
Subject: Re: best statistic target for boolean columns
Date: 2004-09-27 19:13:45
Message-ID: 87k6uf7ceu.fsf@stark.xeocode.com (view raw or flat)
Thread:
Lists: pgsql-performance
> Gaetano,
> 
> > don't you think the best statistic target for a boolean
> > column is something like 2?  Or in general the is useless
> > have a statistics target > data type cardinality ?
> 
> It depends, really, on the proportionality of the boolean values; if they're 
> about equal, I certainly wouldn't raise Stats from the default of 10.   If, 
> however, it's very dispraportionate -- like 2% true and 98% false -- then it 
> may pay to have better statistics so that the planner doesn't assume 50% 
> hits, which it otherwise might.

No, actually the stats table keeps the n most common values and their
frequency (usually in percentage). So really a target of 2 ought to be enough
for boolean values. In fact that's all I see in pg_statistic; I'm assuming
there's a full histogram somewhere but I don't see it. Where would it be?

However the target also dictates how large a sample of the table to take. A
target of two represents a very small sample. So the estimations could be
quite far off.

I ran the experiment and for a table with 2036 false rows out of 204,624 the
estimate was 1720. Not bad. But then I did vacuum full analyze and got an
estimate of 688. Which isn't so good.

-- 
greg


Responses

pgsql-performance by date

Next:From: Jim C. NasbyDate: 2004-09-27 19:18:36
Subject: Re: Caching of Queries
Previous:From: Mitch PirtleDate: 2004-09-27 18:59:13
Subject: Re: Caching of Queries

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group