Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Zeugswetter Andreas IZ5 <Andreas(dot)Zeugswetter(at)telecom(dot)at>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)
Date: 1999-07-28 16:06:22
Message-ID: 199907281606.MAA11774@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Zeugswetter Andreas IZ5 <Andreas(dot)Zeugswetter(at)telecom(dot)at> writes:
> > Other db's usually use the value count(*) / nunique for the light
> > weight statistics. This makes the assumptoin that the distinct index
> > values are evenly distributed. That is on average a correct
> > assumption, whereas our assumption on average overestimates the number
> > of rows returned. I am not sure we have a nunique info though.
>
> We don't, and AFAICS it would be an expensive statistic to compute.
>
> I have thought about this a little more overnight, and I have come up
> with what I think is a better idea. Suppose that VACUUM ANALYZE stores
> in pg_statistic not only the disbursion, but also the most frequently
> occurring value of each column. It already computes (or I should say
> estimates) the most frequently occurring value (MFOV) in order to arrive
> at the disbursion, so storing the value costs nothing except a little
> more space in pg_statistic. Now, the logic that eqsel() should use is
>
> if constant-being-compared-against == MFOV then
> return disbursion;
> else
> return MIN(disbursion, 1.0 - disbursion);

Yes, I like this.

> BTW, this argument proves rigorously that the selectivity of a search
> for any value other than the MFOV is not more than 0.5, so there is some
> basis for my intuition that eqsel should not return a value above 0.5.
> So, in the cases where eqsel does not know the exact value being
> searched for, I'd still be inclined to cap its result at 0.5.

I don't follow this. If the most frequent value occurs 95% of the time,
wouldn't the selectivity be 0.95? And why use 1-disbursion. You may
find the existing code does a better job than the MIN() computation.

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-07-28 16:07:21 Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)
Previous Message Bernard Frankpitt 1999-07-28 15:54:20 Re: [HACKERS] row reuse while UPDATE and vacuum analyze problem