Skip site navigation (1)
Skip section navigation (2)
## Re: best statistic target for boolean columns

### In response to

### Responses

### pgsql-performance by date

Tom Lane wrote: > Gregory Stark <gsstark(at)mit(dot)edu> writes: > >>No, actually the stats table keeps the n most common values and their >>frequency (usually in percentage). So really a target of 2 ought to be enough >>for boolean values. In fact that's all I see in pg_statistic; I'm assuming >>there's a full histogram somewhere but I don't see it. Where would it be? > > > It's not going to be there. The histogram only covers values that are > not in the most-frequent-values list, and therefore it won't exist for a > column that is completely describable by most-frequent-values. > > >>However the target also dictates how large a sample of the table to take. A >>target of two represents a very small sample. So the estimations could be >>quite far off. > > > Right. The real point of stats target for such columns is that it > determines how many rows to sample, and thereby indirectly implies > the accuracy of the statistics. For a heavily skewed boolean column > you'd want a high target so that the number of occurrences of the > infrequent value would be estimated accurately. > > It's also worth noting that the number of rows sampled is driven by the > largest per-column stats target in the table, and so reducing stats > target to 2 for a boolean column will save *zero* effort unless all the > columns in the table are booleans. Thank you all, now I have more clear how it works. Btw last time I was thinking: why during an explain analyze we can not use the information on about the really extracted rows vs the extimated rows ? Now I'm reading an article, written by the same author that ispired the magic "300" on analyze.c, about "Self-tuning Histograms". If this is implemented, I understood we can take rid of "vacuum analyze" for mantain up to date the statistics. Have someone in his plans to implement it ? After all the idea is simple: compare during normal selects the extimated rows and the actual extracted rows then use this "free" information to refine the histograms. Regards Gaetano Mendola

- Re: best statistic target for boolean columns at 2004-09-27 19:26:32 from Tom Lane

- Re: best statistic target for boolean columns at 2004-09-27 23:43:45 from Neil Conway

Next: From:Neil ConwayDate:2004-09-27 23:43:45Subject: Re: best statistic target for boolean columnsPrevious: From: Matt ClarkDate: 2004-09-27 21:41:52Subject: Re: Caching of Queries