Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Problem (bug?) with like

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, drheart(at)wanadoo(dot)es, Lista PostgreSql <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Problem (bug?) with like
Date: 2001-12-28 18:27:26
Message-ID: 200112281827.fBSIRRk28377@candle.pha.pa.us (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackers
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > But what about '%A%' vs. '%AC%'.  Seems the second is reasonably
> > > different from the first the our optimizer may be fine with that.  Is it
> > > only when the strings get longer that we lose specificity?
> > 
> > Yeah, I don't think that the estimates are bad for one or two
> > characters.  But the estimate gets real small real fast as you
> > increase the number of match characters in the LIKE pattern.
> > We need to slow that down some.
> 
> Yea, maybe a log base 2 decrease:
> 	
> 	1 char	1x
> 	2 char	2x
> 	4 char	3x
> 	8 char	4x
> 	16 char 5x

I did a little research on this.  I think the problem is that ordinary
characters are assumed to randomly appear in a character string, while
in practice, if the string has already been specified like 'DAV', there
are very few additional characters that can follow it and make sense.

Looking at backend/utils/adt/selfuncs.c, I see this:

	#define FIXED_CHAR_SEL	0.04	/* about 1/25 */
...
	sel *= FIXED_CHAR_SEL;

which means every additional character reduces the selectivity by 96%. 
This seems much too restrictive to me.  Because of the new optimizer
buckets, we do have good statistics on the leading character, but
additional characters drastically reduce selectivity.  I think perhaps a
number like 0.50 or 50% may be correct.

That would be a table like this:

 	1 char	2x
 	2 char	4x
 	4 char	8x
 	8 char	16x
 	16 char 32x

which is more restrictive than I initially suggested above but less
restrictive than we have now.

Should we assume additional characters are indeed randomly appearing in
the string?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

In response to

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2001-12-28 18:35:41
Subject: Re: Problem (bug?) with like
Previous:From: Bruce MomjianDate: 2001-12-28 17:57:53
Subject: Re: TODO question

pgsql-general by date

Next:From: Bruce MomjianDate: 2001-12-28 18:28:55
Subject: Re: First decent PostgreSQL CBT now on techdocs.postgresql.org
Previous:From: Tom LaneDate: 2001-12-28 18:21:19
Subject: Re: First decent PostgreSQL CBT now on techdocs.postgresql.org

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group