Skip site navigation (1)
Skip section navigation (2)
## Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)

### Responses

### pgsql-hackers by date

Zeugswetter Andreas IZ5 <Andreas(dot)Zeugswetter(at)telecom(dot)at> writes: > Other db's usually use the value count(*) / nunique for the light > weight statistics. This makes the assumptoin that the distinct index > values are evenly distributed. That is on average a correct > assumption, whereas our assumption on average overestimates the number > of rows returned. I am not sure we have a nunique info though. We don't, and AFAICS it would be an expensive statistic to compute. I have thought about this a little more overnight, and I have come up with what I think is a better idea. Suppose that VACUUM ANALYZE stores in pg_statistic not only the disbursion, but also the most frequently occurring value of each column. It already computes (or I should say estimates) the most frequently occurring value (MFOV) in order to arrive at the disbursion, so storing the value costs nothing except a little more space in pg_statistic. Now, the logic that eqsel() should use is if constant-being-compared-against == MFOV then return disbursion; else return MIN(disbursion, 1.0 - disbursion); which works like this: if we are indeed looking for the MFOV then the selectivity is just the disbursion, no question. If we are looking for a value *other* than the MFOV, then the selectivity must be less than the disbursion, since surely this value occurs less often than the MFOV. But the total fraction of non-MFOV values in the table is 1.0-disbursion, so the fraction that are the specific value we want can't exceed that either. The MIN() above is therefore a hard upper bound for the selectivity of the non-MFOV case. In practice we might want to multiply the MIN by a fudge-factor somewhat less than one, to arrive at what we hope is a reasonable estimate rather than a worst-case estimate. BTW, this argument proves rigorously that the selectivity of a search for any value other than the MFOV is not more than 0.5, so there is some basis for my intuition that eqsel should not return a value above 0.5. So, in the cases where eqsel does not know the exact value being searched for, I'd still be inclined to cap its result at 0.5. If we use this logic, the stat we really want is exactly the frequency of the MFOV, not the disbursion which is just closely related to it. I have not looked at the other uses of disbursion, but if they all can work like this we might want to forget the statistical niceties and just store the frequency of the MFOV. A final comment is that NULL would be treated just like any regular value in determining what is the MFOV... regards, tom lane

- Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect) at 1999-07-28 16:06:22 from Bruce Momjian

Next: From:Bernard FrankpittDate:1999-07-28 15:02:41Subject: Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple select)Previous: From: Thomas LockhartDate: 1999-07-28 14:51:17Subject: Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)