Re: Problem (bug?) with like

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: drheart(at)wanadoo(dot)es, Lista PostgreSql <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problem (bug?) with like
Date: 2001-12-29 04:55:19
Message-ID: 200112290455.fBT4tLm13381@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > But what about '%A%' vs. '%AC%'. Seems the second is reasonably
> > different from the first the our optimizer may be fine with that. Is it
> > only when the strings get longer that we lose specificity?
>
> Yeah, I don't think that the estimates are bad for one or two
> characters. But the estimate gets real small real fast as you
> increase the number of match characters in the LIKE pattern.
> We need to slow that down some.

OK, I think I have the proper value for FIXED_CHAR_SEL. It is currently
0.04 or 1/26, meaning the letters are random, though this is not usually
the case.

If we assume our optimizer buckets have given us a reasonable value for
the first character, suppose it is an 'F', there are only a few valid
characters after that, at least in English. There are vowels, and a few
consonants, and given that character, there are only a few characters
that can be valid after that. To my thinking, it is two characters that
represent the same distribution as one random character, leaving 0.20 as
the proper value for FIXED_CHAR_SEL because 0.20 * 0.20 is the same as
0.04.

Added to TODO:

* Change FIXED_CHAR_SEL to 0.20 from 0.04 to give better selectivity (Bruce)

If people think there is a better value for this, please chime in.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2001-12-29 05:08:12 Re: Problem (bug?) with like
Previous Message Thomas Lockhart 2001-12-29 03:39:42 date/time formats in 7.2

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-12-29 05:08:12 Re: Problem (bug?) with like
Previous Message Bruce Momjian 2001-12-29 03:51:06 Re: TODO question