Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple select)

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ole Gjerde <gjerde(at)icebox(dot)org>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple select)
Date: 1999-07-27 22:58:01
Message-ID: 199907272258.SAA19324@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > It looks like the highly skewed distribution of nsn values (what you
> > sent me had 997 '' entries, only 3 non-empty strings) is confusing the
> > selectivity estimation code somehow, such that the system thinks that
> > the query is going to match most of the rows. Notice it is estimating
> > 995 returned rows for the nsn select! Under these circumstances it will
> > prefer a sequential scan, since the more-expensive-per-tuple index scan
> > doesn't look like it will be able to avoid reading most of the table.
> > That logic is OK, it's the 0.995 selectivity estimate that's wrong...
>
> It turns out that the selectivity estimate for an "=" comparison is just
> the attdisbursion statistic calculated by VACUUM ANALYZE, which can be
> roughly defined as the frequency of the most common value in the column.
> (I took statistics too long ago to recall the exact definition.)
> Anyway, given that the test data Ole sent me contains nearly all ''
> entries, I'd say that the 0.995 value is about right for disbursion.

Yes, you are correct, though it does look at potentially one or two
other unique values, depending on the distribution. It basically
perfectly computes disbursion for unique columns, and columns that
contain only two unique values, and it figures in NULL. In other cases,
the disbursion is imperfect, but pretty decent.

> My inclination is to hack up eqsel() to never return a selectivity
> estimate larger than, say, 0.5, even when the measured disbursion
> is more. I am not sure that this is a good idea, however. Comments?

I would discourage this. I can imagine many cases there >0.5
selectivites would be valid, i.e. state = "PA".

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 1999-07-27 23:04:58 # of Index' Tuples != Heap'
Previous Message Ross J. Reedstrom 1999-07-27 22:27:52 Re: [HACKERS] Checking if a system is ELF