> Comments? Is there a better way? What's the best probability to use?
For this particular example, "partial indices" seems to be the best fit.
The index can be chosen to omit the most common value(s), since those
would indicate a sequential scan anyway.
Other DBs allow a parameter to set the "fill ratio" of index pages,
which might also help. But probably not as much as you might like when
one is doing a large number of inserts at a time.
Your "randomized" algorithm looks very promising. What is the status of
partial indices? Are they functional now, or have they been broken
forever (I'm not recalling)?
- Thomas